# Classification Algorithms of Machine Learning

Algorithms in Machine Learning can be used to assist in the data processing. Some of these algorithms are used depending on the data collected and even case studies or problems that need to be known for their solution. One of them is the process of classifying the data. Classification is the step where objects are grouped into classes with the same characteristics. Classification is done by defining features with essential phrases. Usually, the data that this classification process can carry out already has a label that can later be classified based on a predetermined class. Training and testing data is used for training and testing in this data classification process.

Some researchers often use this classification algorithm by comparing one algorithm with other classification algorithms before choosing which is correct. Because when experimenting with research, it would be nice to make a try-and-error comparison of algorithms before fully implementing it on all data. Comparison of these algorithms is usually caused by knowing which algorithm has the highest level of accuracy. When it has a high level of accuracy, the algorithm is deemed suitable to be implemented in data processing.

Read also : What is Machine Learning?

Here are some classification algorithms that can be implemented in Machine Learning:

**Naive Bayes**

Naïve Bayes is one of the methods used in classifying based on probability calculations. Naïve Bayes can produce high accuracy because it has a fast and straightforward way of working. This makes Naïve Bayes very popular and frequently used. The general Naïve Bayes formula is as follows:

Information :

X: Data with unknown class

H: Hypothesis data X is a specific class

P(H|X): Probability of the hypothesis H based on condition x

(posterior prob.)

P(H): Probability of the hypothesis H (prior prob)

P(X|H): Probability of X based on these conditions

P(X): Probability of X.

**Support Vector Machine (SVM)**

Support Vector Machine (SVM) is a supervised learning method to classify data to find the best hyperplane by partitioning the class input space. Support vector machine has the basic principle of linear classification. The first important thing to understanding SVM classification is finding the optimal hyperplane. The goal is to distinguish between two data classes, positive (1) and negative (-1). Positive data (1) is marked with a yellow symbol, and negative data (-1) is marked with a red sign. In general, an overview of the SVM process can be seen in the image below. The diagram to the left of the figure illustrates the possible extraction of SVM boundaries to generate data sets. The graph on the right shows the limits of discrimination with maximum reserves. The margin, or dividing line, is the distance between the two closest data classes on the hyperplane—hyper level with the best-generalized margin for better classification results.

**Decision Tree**

A decision tree uses a tree-like diagram as a decision support system. This algorithm has been successfully applied as a classification method. In this algorithm, internal nodes represent attribute tests, branches represent test results, and leaf nodes represent class identifiers. When selecting the root attribute, it is necessary to set the highest validation value of the existing qualities. Validation is one of the dimension selection attributes used to determine test attributes for each node in the tree.

**Random Tree**

The random tree examines the decision tree and uses a random subset for each available attribute assignment. This algorithm consists of two steps. First, this algorithm can build a decision tree by using some data as training data and choosing the feature values to be cut that maximize information retrieval according to the needs of each stage. The resulting structure, called a tree frame, is repeated until the tree reaches its specified destination. Second, training data determines the appropriate value or classification by calculating class statistics on leaf nodes. The training data features are used to construct the tree structure, and the data is used to update class probabilities. This probability is used to track the number of samples classified by each node. At the test point, each tree generates class probabilities. The possibility of all trees in a group is averaged to estimate that class’s total likelihood.

This algorithm is easy to implement and can also provide maximum predictive results because of the advantages of representing data in the form of a tree compared to other approaches.

**Random Forest**

Random Forest or Random Forest is used in classification, regression, and other tasks based on the ensemble method according to knowledge learning. The performance of this algorithm is adjusted using the decision tree method, where each tree is compressed from a bootstrap sample of training data. Then, a subset of attributes is selected randomly from the best features in two parts. This algorithm generates a random sample of data and provides a sequence of keys to developing a selection tree. The classification process is based on unobserved data, taking the majority for each tree. It can significantly provide maximum efficiency for various operational problems when data collection should not overlap. Therefore, combining several tree classifications should be studied separately. In addition, this algorithm handles noise and outliers well and is easy to implement.

**K-Nearest Neighbor (KNN)**

KNN is the most accessible algorithm to understand and implement, so many studies apply this method in the classification process. This algorithm chooses the appropriate value for k. The data set closest to the target. The classification process depends on the value of k. To select this k value, this algorithm must be run several times to produce different k values and then take the best performance value.

Determining the most suitable classification algorithm can usually be done by doing calculations by looking for the Confusion Matrix value of each algorithm. This value will later be compared to the highest and best value. A confusion Matrix is a method for calculating the accuracy of data mining concepts. The number of correct and incorrect test data is tabulated in data classification.

Read also : Characteristic of Big Data

Prediction | |||

1 | 0 | ||

Actual | 1 | TP | FN |

0 | FP | TN |

Information :

- True Positive (TP) is the number of documents from class 1 that are correct and classified as class 1.
- True Negative (TN) is the number of documents from class 0 which are correctly classified as class 0.
- False Positive (FP) is the number of documents from class 0 which are incorrectly classified as class 1.
- False Negative (FN) is the number of documents from class 1 incorrectly classified as class 0.

Later this table can be used as a benchmark in calculating Accuracy, Recall, Precision, and F1-Score values.

Accuracy is the percentage of total sentiment that is correctly recognized. The accuracy calculation is done by dividing the correct sentiment data by the total and test data to calculate the accuracy value. Meanwhile, Precision compares the amount of relevant data found to the amount found. The precision calculation is done by dividing the number of correct data with a positive value, the number of accurate data with a positive value, and the incorrect data with a positive value. The value of the false positive data is taken from the number of values other than the true positive column that corresponds to each class. The recall compares the amount of relevant material found against the amount of pertinent material. The recall calculation is done by dividing correct data with a positive value by the sum of the accurate data, which has a positive value, and incorrect data, which has a negative value. The value of false data that has a negative value is taken from the number of values other than true positive rows that correspond to each class. Finally, F1-Score is a single parameter to measure retrieval success that combines recall and Precision. The F1-Score value is obtained by calculating the multiplication of Precision and recall divided by the sum of the Precision and recall, then multiplied by two.