Question about RL agents controlling other RL agents
Hi, I'm a beginner in the field of reinforcement learning, currently interested in physics-based motion control.
As I was looking at various well-known environments such as the Robot Arm, a question occurred to me about how one would attempt to perform well in a physics based environment involving controlling such models to achieve complex tasks that are more abstract than simply reaching a certain destination. Particularly, the question occured from this paper, with the image of the problem scenario shown below.
For example, say I were to create a physically simulated environment where the Robot Arm aims to perform well in an online 3D bin packing problem scenario, where the robot arm grabs boxes of various sizes from a conveyor belt and places them onto a designated spot, trying to fit as much of them as possible in a constrained space.(I guess I could model the reward to be related to the volume of the placed boxes' convex hull?)
I would imagine that having a multi layered approach of different agents may work adequately, one for solving the 3D-BPP problem, and one for controlling the individual motors of the robot arm to move a box to a certain spot, so that the 3D-BPP solver's outputs may serve as an input for the robot arm controller agent. However, I can't imagine that these two agents would be completely decoupled, since certain commands of the 3D-BPP solver may be physically unviable for the robot arm's movement without disrupting the previously-placed boxes.
In scenarios like this, I'm wondering what is the usual approach:
- Use a single agent to be able to control these seemingly distinct tasks(solving 3d-bpp, and controlling the robot arm) all by itself?
- Actually use two agents and introduce some complexity into the training sequence so that the solver can take the robot arm controller's movement into account?
In case this is a trivial question, any link to beginner-friendly literature that I could read up on would be greatly appreaciated!