Making a quadruped robot walk is not just a matter of movement, but of dynamic balance. The challenge for DII researchers was to train the Unitree Aliengo robot to move smoothly while strictly respecting physical and kinematic constraints. To achieve this, the team used Reinforcement Learning (RL), but with an innovative twist: instead of relying solely on standard algorithms, they developed a custom variant designed to prioritize system safety. The technological core of the project lies in the optimization of the Proximal Policy Optimization (PPO) algorithm. While traditional methods penalize errors (such as falls or collisions) by simply subtracting points from the robot’s score, the proposed solution introduces an adaptive discount factor γ (Fig.1).
PPO is a Reinforcement Learning algorithm belonging to the policy gradient family. It iteratively optimizes the policy πθ by limiting the difference between successive policies through a clipping mechanism, improving training stability. In quadruped robots, major constraint violations may include exceeding kinematic limits, operating beyond velocity or acceleration thresholds, incorrect ground contacts, loss of balance or falls, and deviations from the desired gait style (fig.2).
In traditional RL methods, these violations are handled through penalties directly embedded in the reward. Our variant, based on the Constraints as Terminations framework, instead introduces an adaptive discount factor modulated according to the severity of the violation. In this way, when the robot’s behavior significantly deviates from imposed constraints, the weight of future rewards is reduced, encouraging the learning of safer, more stable, and more energy-efficient gaits. This is Constrained Reinforcement Learning.
The successful transfer of the robot’s digital “mind” to its physical body was ensured through a rigorous validation process:
The results achieved in Trento are just the beginning. Future research will focus on more challenging terrains and on loco-manipulation: integrating robotic arms on quadrupeds, enabling them not only to explore but also to actively interact with the environment in logistics and rescue scenarios (Fig.3).
Fig. 1: Unitree Aliengo Robot
Fig. 2: RL Policy Control Scheme
Fig. 3: Mujoco Unitree Aliengo Model