Hello,
Adaptive algorithms use statistics but when driver restarts or hardware failure, statistics will be gone and load balancing algorithm adaptation will be return to beginning.
- Is it possible (and logical?) to save job execution statistics periodically and load them to same driver while restart or to another driver which already running?
- Another idea, maybe sharing these statistics with peer drivers, so when one of them down, informations still exist on other peers and when it restarts or a new driver added as peer, it will start with existing statistics.
We are planning to use p2p because of the risk of a single point of failure, but progress of algorithm's learning important and it shouldn't reset each time the server reset. Maybe I'm wrong, but using common statistics between all peer drivers (even with drivers added lately) make sense to me.