Big data systems shine light on neglected ‘dark data’

By Jack Vaughan

The notion of «dark data» lurking in the shadows of IT systems has been around for years. But with the increasing adoption of Hadoop and other highly scalable big data technologies, more of that data is poised to come out into the open.

Consulting company Gartner Inc. marks dark data as «information assets that organizations collect, process and store in the course of their regular business activity, but generally fail to use for other purposes.» Now, the ability of Hadoop clusters and NoSQL databases to process large volumes of data makes it more feasible to incorporate such long-neglected information into big data analytics applications — and unlock its business value.

As a result, archived data that was «just lying around» has become a potential goldmine for organizations, not simply an untapped pool of information they were obliged to keep for regulatory compliance purposes, said Aashish Chandra, divisional vice president of application modernization at Sears Holdings Corp. in Hoffman Estates, Ill.

«This is a different world we’re living in,» said Chandra, who is also general manager of the big data and legacy systems modernization business in Sears’ MetaScale LLC professional services unit. «People were using backup tapes for archiving. Now you can put that data in Hadoop and query the data in real time.»

In the past, some data was left dark because it was too old to be useful by the time it was made available to business users for analysis. A Hadoop-based data warehouse put into production in February by Inc. has accelerated that process and opened up new views of data that are helping the company reduce operating costs, said Paddy Hannon, vice president of architecture at the online publisher of car-shopping information in Santa Monica, Calif.

«We’ve had some ‘Eureka’ data moments,» Hannon said. For example, the new system lets the workers who manage keyword acquisition for the company’s paid-search and online advertising efforts quickly probe incoming data to assess how changes in buying tactics will affect marketing initiatives. «That saved a significant amount of money,» Hannon said — more than $1.7 million as of mid-June, according to a blog post by Philip Potloff, chief information officer at Edmunds.

Gartner analyst Merv Adrian told attendees at the Hadoop Summit 2013, held in June in San Jose, Calif., that he expects more and more companies to begin auditing their archives of data to identify dark bits and try to map them to possible business uses.

«Much of what we’re doing with big data is restoration of context,» Adrian said. Expanding on Gartner’s definition, he described data darkness as a state where «you know the transaction happened, but you don’t know what went on around it» — something that needs to be illuminated in order to turn dark data into business gold.