The White House released a set of reports this month on Big Data and the privacy implications of Big Data. While a number of folks have been discussing the President’s Council of Advisors on Science & Technology (“PCAST”) report, I would offer that the Office of Science and Technology Policy (“OSTP”) report needs to be read in conjunction with the PCAST report. They do two different things. One is a report on the technical state of affairs, and the other is more of a policy direction piece, which is driven by the technologically-oriented findings. Various points-of-view have been put forth as to the relative merits of each report, but there seems to be an important element missing from both reports. Both reports discuss the need for policy decisions to be based on context and on desired outcomes. Unfortunately, neither report really gives a good taxonomy around the informatics ecosystem to allow for a clear path forward on “context” and “desired outcomes”. What I mean by this is best summed up in the comment in the PCAST report which states: “In this report, PCAST usually does not distinguish between “data” and “information”.”. “Data” and “Information” are very different things, and one really can’t have a coherent policy discussion unless the distinction between the two is recognized and managed.
The importance of having a clear taxonomy around the informatics lifecycle cannot be overstated. In fact, the challenges of most privacy system implementations reflect this circumstance. For example, attempting to classify “personal information” is not an easy thing. Is a first/last name combination with ZIP personal information? If the name is John Smith and the ZIP is 11004, likely not. However, if the name is John Tomaszewski and the ZIP is 77002, it absolutely is personally identifiable – there is only one of me. Consequently, we need a better way of describing the different relative elements of the taxonomy.
Often, we hear Data and Information used interchangeably. This most certainly not the case. Data, by itself is a representation, or token, of a fact. For example, data is 77002. It is a ZIP code. By itself, data isn’t very useful. You can’t action raw data. This is the foundational state for the taxonomy. It is also rather rare in the real world.
Information is the next transformative state of Data. It is Data used within a context. The context or “metadata” is what gives value to the Data. To go back to the name and ZIP example, the context that the last name is Polish and the ZIP is in Houston, transforms two simple data points into Information. You now have an identity of a unique individual.
Knowledge is the next transformative state of Information (a pattern emerges). Not only is Knowledge actionable, it can be used to evaluate and identify past patterns. Instead of only action, Knowledge provides the capability of Understanding.
The final transformative state in this taxonomy is Wisdom (You can call it whatever you want, but this seems to fit). Wisdom is enough Knowledge to be able to start to predict future states.
Each of the states gets triggered by a critical mass of the prior state being fused together. This continued fusion of Data with more and more Data is what makes Big Data useful – you can finally get to Wisdom.
The challenge that the two White House reports have, is that they discuss the risks associated with Big Data without describing which level in the taxonomy they are concerned with. Each level of the taxonomy has a greater and greater potential for impact (both good and bad). Consequently, if you are looking for context-based, outcome-driven policy, you need to know which layer you are in the taxonomy. Neither report does this in an effective manner. As a result, whether you think the reports are a good thing, or “too little, too late” there is still going to be a deficiency in having the policy conversation until those at the table start using the same structure.