Data storage is increasing by leaps and bounds thanks to the vast spread of various sensors, transactions and clicks online. The emerging challenge is data analysis to make sense of it in quick time in response to changing demands. The challenge is indicated in the definition of big data that it is too big, too fast and too hard for the existing tools to process. We have today data systems which handle data on a petaflop scale (a thousand million million flops), though fast processing is demanded by situations such as fraud detection or sale of goods.) What is wanted is machine-learning algorithms that are easier for common users.
The US Government has announced a ‘Big Data’ initiative to advance the state-of-the-art core technologies needed to collect, store, preserve, manage, analyse and share high quantities of data. For example, the data from the 1000 Genomics Project will be put into the cloud. The world’s largest set of data on human genetic variation is a 200 terabytes. Another type of data stored will be related to our planet which will be of great interest to geo-scientists. All of this data will be free hosted by the Amazon Web services cloud.
Global pulse, a United Nations initiative, wants to leverage Big Data for global development. It plans to get digital warning signals to guide assistance programmes in advance. The warning is timely as algorithms increasingly determine an expanding space in our lives.
There can be a downside as well to the emergence of data deluge. Computer virus will have greater scope to attack. Identity impersonation may increase. And intrusions into privacy may go up. These are inevitable consequences of a historic change in the way computers will handle data in the near future.
In 2011, a total of 1.8 trillion gigabytes of data per day was created. Significantly, three-fourths of it was produced by ordinary consumers. The trend will continue as people expect almost every service from the Internet. The meteoric rise of data on the Internet has a profound impact on the world’s energy resources and pollution levels.
Big Data and Energy Demand
One of the features about the search engine is the enormous power it consumes for its work. A video on YouTube, for example, showed one of its data centres, where 45,000 servers were placed. It was disclosed that Google has placed on uninterrupted power supply at each server instead of a centralized supply source. It has been stated that a typical search needs 0.3 watt hours of electricity, which is equivalent of a 100-watt light bulb to be lit for ten seconds.
For handling a billion searchers a day, it needs 12.5 million watts Hence it is imperative to save on power. Google recently disclosed that it needs 260 million watts (equivalent at one fourth the output of a typical nuclear power plant) for its data centres around the world. This is considered enough to power 200,000 households. As the Internet traffic is expected to increase four-fold in the next five years, Google has set up a power plant on the Baltic coast of Finland. Globally, 1.5 per cent of the total electricity generated is used by Google.
Social media and search engines make a huge demand on the world’s energy resources, if only to keep themselves free from potential breakdown. Most of the world’s three million data centres, where mega servers handle data from the Internet, consume vast amount of energy in a wasteful manner. Worldwide, the digital data centres use about 30 billion watts of electricity, equal to the output of about 30 nuclear power plants. Though a data centre runs at maximum capacity, it typically uses an average only 6-12 per cent for the computational tasks of its servers. The over-provisioning is made simply to keep the servers running for fear of a crash even for a few seconds. Many servers are labelled idle or comatose by engineers but no attempt is made to stop them from idling. Contrary to popular nation, cloud computing does not save energy. The cloud just changes the location, where applications are carried out.
Moreover, together with back-up generators and batteries, the deployment of power sources pollutes the atmosphere. Several centres have been found violating air quality regulations.
There are several challenges posed by Big Data. First, identifying which data are relevant is a problem. Second, perfect information is invariably not available to base corporate decisions, which are often driven by leadership and windows of opportunity as perceived by the managements. Third, formulating the right questions will be more important than going by the data collected in general.
It is tempting to recall the prescient words of T.S. Eliot who asked, “Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?” To this one may add the question posed by a critic, viz. “Where is the information we have lost in data?” Certainly, we have created more than what we can comprehend, much less utilize. Perhaps it is a tribute to human ingenuity. As Danny Hillis, inventor of supercomputers says, the greatest achievement of human technology is tools that allow us to create more than we understand.