What is Data Mining ?
Sorting out data according to predetermined categories is easy with say 10 data-sets but becomes a challenge if the number involved is say five billion. Only powerful computers can handle this sort of processing.
Data on the scale of billions has become common place. This needs automation and that is what data mining is all about. Data mining is the process by which new information is gleaned by examining large databases.
Processing becomes a program in machine learning according to given instructions. The techniques of data mining can be broadly characterized according to the targets given. Let us see some examples.
First, the target could be detecting anomalies in a huge pile of identical returns pertaining to property details or tax levied or claimed. Even one in thousand, which is different from the rest of the files, will be significant. Second, learning by association could be a target. It is best understood by sales strategies. For instance, if you bought a music player, then advertisements offering CDs and the latest hits will be suggested. Or if you have been mostly buying tickets to crime thriller movies, the newly released crime movies would pop up on your screen. A well-known online bookseller invariably adds that those who bought this book which you have ordered have also bought the other titles listed on your screen. It is an inducement for you to consider and buy.
Third, the data on the goods ordered would be used to group the buyers and their location, if possible, for further marketing strategies. For instance, the data on dental equipment sold online would be used to build up a profile of demand for such equipment. A lot of other conditions should qualify such projections. Those who buy nets need not be fishermen: they could be anglers who have fishing as a hobby.
Fourth, computers can be programmed to locate spam on the basis of objectionable or unwanted mails. Lastly, building predictive models based on the data gathered has become a professional exercise. Such projections of consumer demand, weather patterns and production and sales trends have been found quite useful.
Though primarily used for advertising on the Internet, data mining is fast becoming a discipline on its own predicting the probable trends in many areas of the uncertain world of today.
How Data Mining affects us all ?
Data mining has its impact on individuals. It allows companies and governments to use the information one provides to reveal more than one thinks. Even as we gather more data than we can handle, powerful computers especially those working with social networks, will gobble up the huge mountains of data and try to make some sense of it, often in response to corporate demands.
The International Data Corporation foresees a high technology industry in the convergence of mobile devices, social networks and cloud-based computing and data storage. Spending on new technologies is growing at six times that of traditional computer servers and PCs. It will pose new demands, especially for storage of data. Cloud computing has come in time. Companies that provide cloud servers to business are expected to get more than half of the spending. On privacy, the Corporation says in a report that while there is increased awareness of privacy issues, there is still no sense of immediate urgency. Users trust the system and the convenience it provides as long as no harm is inflicted on a personal level.
A survey by Ericsson Consumer Lab finds that users feel safe sharing music playlists or their beliefs on religion etc., but are least inclined to share data about their medical records or finances.
Big Data Analytics in India
Computer facilities to handle data are coming up in India in a big way. The Indian grid (network of computers that shares resources) called GARUDA (Global Access to Resources Using Distributed Architecture) has a computing capability of close to 70 teraflop (a teraflop is a trillion floating point operations per second).
It may reach exascale (a billion billion flops) by 2016. GARUDA will facilitate data exchange and analysis over a wide range: health care, bioinformatics and climate modeling etc.). India has also a National Knowledge Network which connects over 700 institutional networks. In addition there is ERNET which is a national network of academic institutions in the country.
An overview of Big Data and Data Mining is provided in the video below :