How a robot policy trains in simulation: the episode loop that generates experience, and how the reward signal shapes what the agent learns. This is the loop that produces the data a VLA or RL policy learns from.