According to the Harvard Business Review we create 2.5 exabytes of data daily. An exabyte is 1 quintillion bytes. A quintillion is 1 followed by eighteen zeros. To put this in context, in 1986 there were about 2.5 exabytes in existence everywhere. This was before the widespread creation and dissemination of digital information; obviously, the rate of data creation accelerated after the advent of the internet. In 2000, there were only about 55 exabytes around. That’s barely three weeks’ worth of data creation in 2013.
More significant than size, is where and in what format this information exists. Ninety percent is what is called unstructured data: digital information not contained within formal databases that is generally uncollectible or unusable using standard correlative methodologies. Unstructured data consists of such things as GPS signals emitted by cell phones and automobiles, webpages, tweets, internet search histories, pdf’s or handwritten notes. Altogether this is called “Big Data” and the world is literally drowning in it.
No one really knows where or when the phrase Big Data was first used, although Steve Lohr from the New York Times credits Dr. John Mashey, a Silicon Valley pioneer, with giving this tiny phrase the expansive meaning it has today. Initially, Big Data meant just that: information files too large to be stored or analyzed on 20th century hardware. Using Google’s public description of its very sophisticated slicing and dicing of the data it harvested to create the web, Doug Cutting and Mike Cafarella created software that stacked and processed all forms of digital information on multiple servers simultaneously. Cutting named the software Hadoop, after the nickname his toddler son gave to his favorite stuffed elephant.
Once data could be Hadooped, it was not long before digital scientists developed software designed to extract meaning from this trove of information. Amazon and Google were early creators of Big Data analytics. They designed algorithms that identified their users’ wants and interests by tracking and cataloging earlier purchases and search histories. But Big Data software should and can do much more than that.
Effective Big Data analytics discovers hidden patterns, creates context for decision-making by turning data-points into a cohesive story, and helps solve problems by determining why things happen and predicting when they will happen in the future. Big Data was used by Netflix to create the new blockbuster hit, House of Cards, and by linguists and literary historians to determine that Jane Austen and Sir Walter Scott had the greatest impact on 19th and 20th century writers. Kaiser Permanente, the health care giant, uses it to track its patients’ medical treatments and outcomes, and discovered early on the harmful side effects of Vioxx.
New Big Data platforms store and process structured and unstructured data at speeds never contemplated by digital scientists as recently as ten years ago. The review and interpretation of information that used to require three days of mainframe computing now happens in ten minutes. Information technology has conquered the three V’s confounding business intelligence, economists, and other professional prognosticators for years: volume, velocity, and variety.
But is Big Data just a 21st century parlor trick? Are we interpolating and extrapolating ourselves into delusion or oblivion? Isn’t it really just another marketing tool developed to help big business sell more widgets?
MIT Professors Andrew McAfee and Eryk Brynjolffson sought to measure the efficacy of Big Data analytics used by a range of businesses and concluded it was a “Management Revolution.” Using empirical methods, they determined that businesses which collected, stored, and analyzed relevant internal and external data using Big Data methodologies were more successful, better run, and better able to anticipate and respond to change than their counterparts who did not.
Privacy and other concerns have caused legislators and legal scholars to call for greater regulation of Big Data collection and the ability of consumers to “opt out” without having to live off the grid, while at the same time acknowledging that the beneficial effects of using Big Data to anticipate the lethal side effects of new drugs or help regulators detect illegal activity before its effects can be felt outweigh its resemblance to Big Brother.
Big Data skeptics, like editorial writer David Brooks, are quick to point out that even the best tested algorithm augmented by the most advanced form of artificial intelligence, which is the “special sauce” of all successful Big Data platforms, do not take social cognition into account, are unable to weigh the relevance of intersecting multiple contexts, create spurious, albeit statistically relevant correlations, and obscure values in the decision-making process.
These criticisms may be valid, but Big Data is here to stay and is changing the investigative, corporate compliance and integrity monitoring world.
Using its own Big Data analytic platform, developed with FusionExperience, a UK-based technology firm, Guidepost Solutions is able to conduct investigations of possible FCPA, ITAR, OFAC, BSA, AML, OECD, and UK Bribery violations around the world in half the time and for half the cost. Our analytic platform, called Guidepost Insight, collects structured data from any pre-existing internal or external database, unstructured data from virtually any resource, including the web, indexes every word number and symbol, and using simple, case-specific rules, designed in conjunction with our clients and industry experts and powered by artificial intelligence, sifts through terabytes zeroing in on problem areas, employees, or transactions.
Using a business procedure overlay called Business Optix, Guidepost Insight can be used as a compliance monitor that identifies “exceptions” to legally required procedures and processes. Using Insight and Optix together, Guidepost is the ultimate integrity-monitoring system, applicable to any industry or business model. In the construction or other project monitoring world, it derives data from multiple external and internal resources, as disparate as site access control signals and waste hauling trip tickets, to identify no-show employees, code violations, theft, and other forms of fraud and waste. It displays the results of these analyses in a visual format that highlights normal and abnormal relationships and hot spots of possible illegal activity.
Big Data is a big deal because it performs sophisticated investigative and analytical tasks in what used to take weeks or months, using expensive and imprecise human capital, in days or hours, using less expensive and more precise electronic software. It creates a better and more complete data-picture, enabling professionals to perform the more important tasks of making decisions, managing risk, enforcing laws, and regulating industries in a better-informed and more reliable way.