Deep (Federated) Reinforcement-Learning-based Action Prediction Model for Error Minimization in GGPO-Rollback-based Online-Gaming

When player A performs a change of action, they send the update to player B, arriving with a certain delay.
During this delay, player B observes the state as if the previous state and action remained unchanged.
When player A's input reaches player B, their system state is rolled back and replayed as if they obtained player A's input back in time at the same moment player A issued it.
Assumptions: clock synchronization.

Idea: instead of relying naively on the perceived state at the frame immediately before, we want to develop an AI-based action/state predictor that aims at minimizing the rollback magnitude.
We model the system as a Deep Reinforcement Learning agent that obtains positive rewards when the rollback is minimal, effectively learning to predict personalized player's actions over a variable time-horizon that is dictated by network conditions.

To improve the model performance, the partially trained models for a set of users can be aggregated on a centralized server, distributedly among part, or randomly in a gossip style, to learn generic player behavior, and then specialize it for the individual player with his characteristics.

 

 

Contact