How do you get your hands around the determination of a relevant sequence among a massive number of events that precede some specific outcome? More to the point: if we want to be able to anticipate some outcome as a result of a set of events, how can we figure out what the most probable events are and the sequence in which they appear?
The best way to approach this challenge is to compare it to an analysis that we might be more familiar with: market basket analysis. One goal of market basket analysis is to determine what sets of items are purchased together as a way of facilitating improved product bundling, packaging and placement to drive product up-selling and cross-selling. The analysis considers the collections of items that all individuals purchase at a single time and looks for any collections (or better yet, subsets) of items that appear together with some degree of frequency. In turn, recommendation engines can take the results of this analysis to suggest product cross-sales. One example is manifested in online sales, in which the user is advised that “people who purchased this item often purchased these other items” or are presented with special pricing for selecting a bundle of items instead of just a single item.
Event sequences are very similar: we are basically looking for the collection of events that are in each customer’s “event basket” prior to the specific outcome, and then looking for those events that appear together in the “event basket” most frequently. This challenge is a little more difficult, though, for one reason: the market basket can be analyzed as a single unit at the point of checkout, while the “event basket” may be affected by a number of additional variables, such as:
- The number of events that precede the outcome
- The time duration over which the events take place
- The order of the events
- The existence of irrelevant events that do not contribute to the outcome
These additional complexities notwithstanding, the market basket analytics algorithms can be adapted to scan the numerous sets of events and come up with some number of sequence patterns that, even if not perfect, still provide some ability to anticipate the outcome and take some action to either encourage it (if it is a positive one) or prevent it (in the negative case). It is just a matter of monitoring those millions of events to find the specific pattern that would precipitate a notification. More on this next time…