SAC for Hybrid Action Space
My team and I are working on a project to build a robot capable of learning to play simple piano compositions using RL. We're building off of a previous simulation environment (paper website: https://kzakka.com/robopianist/), and replacing their robot hands with our own custom design. The authors of this paper use DroQ (a regularized variant of SAC) with a purely continuous action space and do typical entropy temperature adjustment as shown in https://arxiv.org/pdf/1812.05905. Their full implementation can be found here: https://github.com/kevinzakka/robopianist-rl.
In our hand design, each finger can only rotate left to right (servo -> continuous action) and move up and down (solenoid -> binary/discrete action). It very much resembles this design: https://youtu.be/rgLIEpbM2Tw?si=Q8Opm1kQNmjp92fp. Thus, the issue I'm currently encountering is how to best handle this multi-dimensional hybrid (continuous-discrete) action space. I've looked at this paper: https://arxiv.org/pdf/1912.11077, which matlab also seems to implement for its hybrid SAC, but I'm curious if anyone has any further suggestions or advice, especially regarding the implementation of multiple dimensions of discrete/binary actions (i.e., for each finger). I've also seen some other implementations that use a Gumbel-softmax approach (e.g. https://arxiv.org/pdf/2109.08512).
I apologize in advance for any ignorance, I'm an undergraduate student that is somewhat new to this stuff. Any suggestions and/or guidance would be extremely appreciated. Thank you!