Certainly, enabling any coordinator that collects data from numerous sources to integrate and then aggregate individual line-item statistics requires more firepower than just sweeping data sets and dumping them into a single database. One approach is to employ master data management techniques for entity identification and then identity resolution.
Clearly, the same individuals will be represented in the different data sets in different ways. Entity identification can be used to scan the data fields that represent the individual names, parse out the relevant tokens in those name strings and then standardize the representations.
Identity resolution is applied to match the standardized representations against a master index and determine if each entity is already logged within the master index. If so, the matched record(s) can be linked to that unique identity and cached for later aggregation. If not, there are two things to consider. The obvious one is to create a new entry, while the more thoughtful one is to yet again use the MDM techniques to try to find the closest matches and have a data practitioner work with the data publishers to determine any potential matches.
At the same time, it is in the coordinator’s best interests to communicate the shared index of entities in a managed way to the organization providing data. I say “managed” because different situations will be constrained by different data policies. For example, there may be a presumption that one company’s enumeration of individuals may not be directly shared with any other company; publishing a master list might violate this presumption.
One quick thought: the coordinator must manage the master index somehow, so could that coordinator also provide an identity resolution service to help standardize the entity representations across the community? If so, this defines a de facto standard of representation that can be communicated (and hopefully deployed) within each of the organizations within that community, which is an example of how managed services define federated governance policies.
Next time: some other thoughts about the operational model.