Computing Reviews is the Association for Computing Machinery (ACM) publication review website. It isn’t scholarly journal peer review, but rather, book reviews and critique of trends and new developments in computer science and computing applications.
Computing Reviews runs a monthly series, Featured in Five. No, there’s no singing or dancing, like a musical revue. The same five questions are asked of each month’s featured reviewer, a “reviewer revue”, so to speak. I recommend the series in general. Reviewers are computer science academicians or information technology professionals at various stages of their careers. The variation in answers to the same five questions* is fascinating.
My interest is in applied math, statistics and probability. One post dwelt on data analysis more than the rest, and caught my attention.
Big data and uncertainty
The following is an excerpt from the January 2013 post, for which Vladik Kreinovich was the featured reviewer.
Modern technology allows us to generate, store, and process huge amounts of data… big data can (and will) lead to revolutionary breakthroughs in science, engineering, medicine, and so on. The biggest challenge comes from the fact that all this data comes with uncertainty…
It is essential to gauge data accuracy. Until that is well understood and accounted for, big data will be of limited usefulness.
Why is the accuracy associated with big data different, or more difficult, to study than “medium” or “small” data uncertainty? Traditional data analysis and data quality assessment was based sources or which error rates and causes of inaccuracy were known or could be measured, whether deterministically or using models. Kreinovich (computer science, University of Texas at El Paso) gives a specific example:
Processing techniques originated from processing data from well-calibrated sensors, sensors for which we know the probability distribution of measurement errors, for which we can use traditional probabilistic and statistical techniques. The point of big data is to… supplement these “perfect” measurements with numerous less perfect ones—for example, we use temperatures regularly measured by volunteers who use off-the-shelf non-perfectly calibrated sensors. For such data, we often only know the upper bound on the measurement error…
In addition to this bounded interval of data inaccuracy, we also have partial information about probabilities. That is helpful, but it isn’t sufficient to solve the central problem.
Instead of knowing the exact values of the characteristics, we only know the boundary values
Processing data under interval uncertainty is Professor Kreinovich’s primary research interest. The field of study is known globally as interval computation at universities and by industry practitioners.
* The fifth questions is always, “What is your favorite type of music?” so thoughts of a musical variety show, a revue, are more justified than initial impressions might have suggested!