Authors: A. J. Oliner, A. V. Kulkarni, and A. Aiken
Title: Using Correlated Surprise to Infer Shared Influence [pdf] [slides]
Published: International Conference on Dependable Systems and Networks (DSN), 2010.
During the DARPA Grand Challenge race in 2005, the autonomous vehicle named Stanley, Stanford’s entry, slowed down and swerved around an obstacle that was not actually there. Stanley did this several times over the course of the race, nearly causing it to be disqualified. Although Stanley went on to win the competition, the Stanford Racing Team was justifiably vexed: why had Stanley hallucinated these obstacles?
Using a hand-crafted dependency diagram—the golden ideal to which all previous work on dependency inference aspires—it took the designers of the system early two months to isolate the problem, which was originating from a buffer component shared by the laser sensors. This shared buffer was intermittently dropping measurements, causing Stanley to see stale, inconsistent data about the world around him, which sometimes meant seeing obstacles where there were none. Every other component of the system was behaving according to specification, which partially explains why the dependency diagram was so unhelpful: the source of the problem and its outward manifestation (swerving) were on opposite logical ends of the system but the dependency diagram advised looking at almost every component in between. The shared buffer was not even on the diagram.
In a paper to appear at DSN 2010, we introduce the idea of computing influence, a type of component interaction that is orthogonal to dependencies and allows us to capture implicit interactions among components and subsystems. For the Stanley swerving bug, our method not only infers an influence directly between the swerving behavior and misbehavior near the laser sensors, it also implicates an uninstrumented component shared by those lasers: the true cause of the problem. The Racing Team says the results of our analysis, which took only a few seconds to compute, would have saved them two months of debugging.
Computing the strength of shared influence between components is straightforward:
- Represent the behavior of each component as a function of surprise over time, called an anomaly signal.
- See how well these functions “line-up” using a standard technique called cross-correlation.
- Summarize cross-correlations in a Structure-of-Influence Graph (SIG), where the edges indicate the strength and time-delay of the influence.
Our paper gives a mathematical foundation for influence, as described above, and evaluates it using both simulations of idealized systems and case studies with real systems, including Stanley, his successor (Junior), and the Thunderbird supercomputer.
Entries (RSS)