Definition of Data Mining
Miscellanea / / July 04, 2021
By Guillem Alsina González, in Nov. 2018
I have been hearing the maxim that data is the new oil for a long time, but if we have to judge by the name of one of the disciplines that deals with its exploitation and use, the so-called mining of data, I would rather call them "the new coal", by analogy of their forms of extraction.
Data mining is a discipline that consists of drawing conclusions from the automated statistical analysis of a large collection of data.
This data can come from many sources, have different structures, or not even be structured. For this reason, data mining involves systems of artificial intelligence and of machine learning capable of adapting to unstructured data and passing it through filters that allow its analysis.
In the end, the point is that the conclusions serve to help the decision making on a certain system, which can be very varied: from the road traffic of a city or region, to the provision of firefighters and other public services to deal with possible emergencies.
It is also about uncovering patterns that the data follow and that, until now, were hidden or we could not see them clearly, in the midst of all the morass, the large amount of existing data.
What separates data mining from big data? Well, mining deals only with analysis, while mining big data It is a discipline that is responsible for the capture and storage of data, as well as its administration.
To analyze the data correctly, first of all we must determine some objectives that we pursue with the analysis, a series of questions to which we must find an answer, since these will guide where we must search.
Starting from these questions in the form of premises, we choose the data to process (it may be that we only need a part of the database, and not all).
The processing phase differs in each case, and it uses artificial intelligence tools and machine learning, so that they can dynamically adapt to the data entered, modifying their operations if necessary.
The end product of this processing should be a series of conclusions, but let's not confuse these with those to be drawn by those responsible for the system or those who make the final decisions. These conclusions are about the volume of data analyzed.
If we take the example of road traffic in a city again, we can obtain the conclusion that a certain street receives an excessive flow of vehicles, but the system will not give us magic recipes to solve said excess.
Although the system possesses intelligence artificial that can propose solutions, it will always be the task of human personnel to have the last word.
Data mining is being applied in practice in a large number of disciplines, among which the financial ones stand out.
Thus, we can find applications in sections such as the stock market (to predict the behavior of stocks), but also in sectors that are not strictly financial but have a close relationship with the sector, as is the case of insurance.
Natural language processing, online searches or smart cars are other disciplines in which data mining is being applied.
Fotolia photos: Moartist / Thinglass
Topics in Data Mining