Software Engineering Radio is a podcast targeted at the professional software developer. The goal is to be a lasting educational resource, not a newscast. SE Radio covers all topics software engineering. Episodes are either tutorials on a specific topic, or an interview with a well-known character from the software engineering world. All SE Radio episodes are original content — we do not record conferences or talks given in other venues. Each episode comprises two speakers to ensure a lively listening experience. SE Radio is brought to you by the IEEE Computer Society and IEEE Software magazine.

Episode 78: Fault Tolerance with Bob Hanmer Pt. 2

November 23, 2007 45:46 43.94 MB Downloads: 0
This is the second part of the discussion on fault tolerance with Bob Hanmer (if you didn't listen to Episode 77, which contains part one, please go back and listen now; this episode builds on that previous one!) We start by discussing a set of error detection patterns. Among are the well-known approaches such as checksums and voting. We then look at error recovery patterns, including restart, rollback or roll forward. The next section looks at error mitigation patterns, which include shedding load and doing fresh work before stale. The last patterns section then looks at fault treatment patterns. We conclude the episode with a small discussion about how to design systems using (these and other) patterns, and with some thoughts on why actually wrote the book.