Big data" is the latest tech industry buzz-phrase. Depending on who you ask, it either represents a threat to personal privacy, or a revolution in data processing and computing. We'll say this right out of the gate: "Big data" means so many things to so many different people that it runs the risk of meaning nothing at all. That said, there are some places where everyone agrees. Let's dive in.
Wikipedia defines big data as "any collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications." It's a definition that makes sense, and it's the most common way you'll hear scientists, economists, and statisticians describe it. Put simply, "big data" describes huge amounts of information that is easy to obtain, but so massive that they challenge current computing technologies. Big data is the problem you have when you have information coming in from multiple sources (computers, satellites, mobile devices, cameras, microphones, and more). That information needs to be moved around, stored (we're talking petabytes and exabytes, for example), and processed.
If that were all, we'd be finished. Unfortunately, "big data" has also turned into an overused marketing phrase. Software companies and IT service providers use it to convey the superiority of their products or the quality of their talent to customers (and to their competition). Startups and Silicon Valley mainstays love to claim "Our systems are ready for the challenges big data provides," or "Our data scientists know how to handle big data." Unfortunately, those statements don't really say much.
So what should you think when you hear "big data?" It depends on the company that's using the phrase. If some tech startup you've never heard of is proud of how their "algorithm for processing cat pictures means they're capable of managing big data" and that their service "is like [x company] for [y noun]," then you should probably be skeptical. It's distinctly possible that said company has a revolutionary way to aggregate and make sense of all of the cat pictures on the internet, but it's more likely it's a marketing slogan. Similarly, the term is often used to confuse you into thinking the service does something more than harvest your data for marketing purposes. If you hear so-called "data brokers" like Axiom, CoreLogic, or DataLogix using the phrase, they certainly have tons of data to manage, but they're using the phrase to describe who they can harvest from, how they can process it, and who they can sell to
Don't Be Worried About Big Data, Be Worried about Who's Using It
Alan Henry
Big data may feel like far-off number crunching in a datacenter somewhere, but it does have real-world implications. Privacy advocates are concerned about massive volumes of information that can be stored in easily-accessed (and often insecure) databases, and then sold or traded at will. With a scrap of information, it's not difficult for any company or government agency to build a complete picture of a person, their activities, their purchasing, reading, or browsing habits, and more. Best of all, they don't have to collect anything identifiable on their own, and they can use what they get for any purpose they choose.On the bright side, the problem with big data is part of what makes it so useful. It's impersonal and contextless. Just because the data is good doesn't mean that the decisions made using it will be equally good. For example, Google Flu Data did all of the right things and sourced its information from all the right places, but incorrectly predicted infection rates for two years in a row. That means someone may be able to build a picture of you, but the data itself still can't accurately predict your behavior or choices. Big data may mean there's a lot of information floating around, but it still requires human beings with the right skillset to sift through the information and make appropriate decisions based on what's been collected. Time will tell what those decisions turn out to be.
At the end of the day, big data-and the companies making a business out of managing it-are paving the way towards some great innovations in science, technology, and medicine. More information is available and being processed than ever before to study climate, genetics, disease and medicine, physics, and more. However, on the consumer side, expect more of your life and lifestyle to be leveraged to make decisions about you that you may otherwise have no say in. As companies scramble to learn more about us, even seemingly unrelated industries will suddenly become useful to one another-your shopping habits will be useful for health insurance companies and your internet browsing habits will be useful to financial services companies. Unless, of course, you take steps to protect your privacy.
We hope that helps clear the air a bit, Bewitched. It's a deep topic, and because it's a rising industry, it's changing all the time. However, it's important to separate the buzzwords from the facts, and the science from the marketing. Hopefully this helps. Keep an eye on the trend though, it's not going away, even if the buzzword seems silly.