What is Data Mining?

Thebestindonesia.com – Data mining is a theoretical term commonly used to find hidden knowledge in databases. Data mining is a semi-automated process that uses statistical, mathematical, artificial intelligence, and machine learning techniques to extract and explore understanding and valuable and helpful information stored in large quantities. Data mining, also known as knowledge discovery in database (KDD), is an activity that includes collecting and using historical data to find regularities, patterns, or relationships in large data sets. The output of this data mining can be used to improve decision-making in the future. So that the term pattern recognition is rarely used because it is part of data mining. In simple terms, data mining is discovering new information by looking for specific patterns or rules from a massive amount of data. Data mining is often referred to as a series of processes to explore added value in the form of knowledge that has been manually unknown from a data set. Data mining can be interpreted as finding interesting patterns from large amounts of data. The data is stored in databases, warehouses, or other information stores. Data mining is related to other disciplines, such as database systems, data warehousing, statistics, machine learning, information retrieval, and high-level computing. Besides, data mining is supported by other disciplines, such as neural networks, pattern recognition, spatial data analysis, image databases, and signal processing.

Data mining is defined as the process of finding patterns in data. This process is automated or often semi-automatic. The way that it is located must be meaningful, and the design provides benefits, usually economic benefits. Large amounts of data are needed. Based on these definitions, it can be concluded that data mining is a technique of digging up hidden or hidden valuable information in a vast data collection (database) so that an interesting pattern is found that was previously unknown.

Characteristics of Data Mining
Data mining is an activity that includes collecting and using historical data to find regularities, patterns, or relationships in large data sets. The characteristics of data mining are as follows:

  1. Data mining relates to discovering something hidden and specific data patterns that were not known before.
  2. Data mining usually uses extensive data. Usually, big data is used to make the results more believable.
  3. Data mining helps make critical decisions, especially in strategy.
    Several factors drive the continued remarkable advances in the field of data mining:
  4. The rapid growth in the data set.
  5. Data storage in the data warehouse allows all companies to access a good database.
  6. There is an increase in data access through web and intranet navigation.
  7. The pressure of business competition to increase market share in economic globalization.
  8. Development of software technology for data mining (technology availability).
  9. Great developments in computing capabilities and expansion of storage media capacity.

Data Mining Process
Data mining is not just a new field. One of the difficulties of defining data mining is that it inherits many techniques and aspects from preexisting fields of knowledge. The figure below shows that data mining has long roots in areas of science such as artificial intelligence, machine learning, statistics, databases, and information retrieval.

Field of Data Mining Science (Najafabadi et al, 2015)

Data mining and Knowledge Discovery in Database (KDD) are often used interchangeably to describe extracting confidential information in an extensive database. The two terms have different concepts but are related to one another. And one of the stages in the whole KDD process is data mining. The KDD process can be described as follows:

Read also : Application of Business Intelligent for E-Banking Transactions

1. Data Selection
Data selection (selection) from a set of operational data needs to be done before the information mining stage when KDD starts. The selected data used for the data mining process is stored in a file separate from the operational database.
2. Pre-processing/Cleaning
Before the data mining process can be carried out, it is necessary to carry out a cleaning process for the data that is the focus of KDD. The cleaning process includes removing duplicate data, checking which information is inconsistent, and correcting errors in the data, such as typographical errors.

An enrichment process is also carried out, namely the process of “enriching” existing data with other data or information that is relevant and needed for KDD, such as external data or information.
3. Transformation
Coding is a process of transforming the data that has been selected so that the information is suitable for the data mining process. The coding process in KDD is also called the creative process and is highly dependent on a particular type or pattern of information to be searched for in the database.
4. Data mining
Data mining is finding interesting information or patterns in selected data using specific techniques or methods. In data mining, techniques, procedures, or methods vary greatly. KDD goals and processes depend on choosing the correct method and methods.
5. Interpretation/Evaluation
Patterns of information resulting from the data mining process need to be displayed in a form that is easy to understand and understand by interested parties. This stage is part of the KDD process called interpretation. This stage includes checking whether the pattern or information found contradicts the previously existing facts or hypotheses.

Tinggalkan Balasan

Alamat email Anda tidak akan dipublikasikan. Ruas yang wajib ditandai *