next up previous external Back to SYMPHONY Home Page
Next: The Cut Generator Process Up: The Tree Manager Process Previous: The Two-Phase Algorithm

Fault Tolerance

SYMPHONY is designed to be fault tolerant. This is a requirement for solving large problems on networks whose processors may fail unpredictably. The tree manager tracks the status of all processes and can restart them as necessary. Since the state of the entire tree is known at all times, the most that will be lost if an LP process or cut generator process is killed is the work that had been completed on that particular search node. To protect against the tree manager itself or a cut pool being killed, full logging capabilities have been implemented. If desired, the tree manager can write out the entire state of the tree to disk periodically, allowing a warm restart if a fault occurs. Similarly, the cut pool process can be warm-started from a log file. This not only allows for fault tolerance but also for full reconfiguration of the algorithm in the middle of solving a long-running problem. Such reconfiguration could consist of anything from adding more processors to moving the entire solution process to another network.



Ted Ralphs
Thu Jun 8 14:31:17 CDT 2000