An interesting paper describing why do computers stop and possible ways to mitigate this. Written by Jim Gray, from Tandem Computers, it shows the results of study on system failures in large, distributed systems and ways to prevent them and improve MTBF.