The popular term Big Data represents the collection, storage, processing, and analysis of vast quantities of data. At its simplest, it represents a shift in perspective to the belief that just about any piece of data can provide value. More data is better than less data. The trend is mainly apparent in areas like business intelligence and government surveillance, but if you look closely, you can see signs of big data on a personal level.
Fitness trackers log every step of every day (okay, often rolled up into 15+ minute intervals) to provide intelligence on your health, progress, and individual workouts. Using a location based social network to “check in” or rate places yields personalized recommendations. By scanning every email in a GMail account, Google Now can provide automatic reminders for appointments, travel, bills, and more. On many social networks, creating a social graph (connections to friends) results in all data from each connection being combined to surface relevant trends and recommendations.
By providing our data, whether a sensor, access to an account, or information about who we know, people get to easily—and often automatically—reap the benefits of certain types of data analysis.
The problem is too much data is being ignored and we don’t have any ways to prevent meaningful, relevant data from falling through the cracks.
The most obvious area of improvement is location. Location based social networks showed how valuable it can be to know where your friends are. The problem is they only help if everyone actively “checks in” on the same social network. You can see if your friends have checked in nearby, but not if they’ve posted a geotagged post nearby via Twitter or Instagram. Instead, that metadata is lost, missed, forgotten.
Generally, I care more about the words my friends are tweeting and the photos they are sharing than the metadata on those posts, but more data is better than less data.
The root of the problem is silos. Twitter could add location tools to the Twitter app, but it would only work for the subset of my friends that are tweeting. And why would they add that extra functionality to Twitter in the first place? Twitter is not a location based social network.
In order to get value from this content and metadata people are sharing online in one place or another, one must first exfiltrate the data from each silo. Then, geographic data can be analyzed for trends or anomalies based on your current location, text content can be grouped, filtered, or emphasized based on your interests, photos can be browsed in a photostream or gallery, and any type of data could be searched, queried, summarized, grouped, filtered, or highlighted.
We have quite a few useful and magical big data tools operating invisibly on our (and our friends’) data, giving us amazing new insights, alerts, and suggestions. But what they provide is dwarfed by what is possible, at our fingertips, and only just barely out of reach.