Image Classification Using Convolutional Neural Networks (CNN)

The basic principle of machine learning is to use data to create statistical models and make future predictions based on past input data or learn patterns contained in the data. The ability to modify and adapt in response to changes in data is an advantage of machine learning.

Convolutional Neural Network (CNN) is one of the applications of Deep Learning. From its architecture, CNN belongs to the deep feedforward Artificial Intelligence class. Convolutional Neural Network is a neural network designed to process two-dimensional data. The CNN method consists of two methods: classification using feedforward and the learning stage using backpropagation. The working principle of this method is similar to the MLP method. Still, in the CNN method, each neuron is presented in a two-dimensional form which is not the same as in the MLP method, where each neuron only has one dimension. CNN has several layers that are used to filter each process. The CNN architecture consists of one input layer, one output layer, and several hidden layers. The hidden layer generally contains convolutional, pooling, ReLU, normalization, fully connected, and loss layers. All of these layers are arranged in a stack, like a sandwich.

Architecture CNN

Convolutional Layer
The Convolutional Layer is the first layer that receives input to the architecture. This layer is the part that combines linear and local area filters to perform convolution operations. The shape of this layer is a filter with length (pixel), width (pixel), and thickness depending on the input channel image data. These three filters will shift over all parts of the image. The shift will perform a “dot” operation between the input and the filter value, which produces an output called the activation map (feature map).

Pooling Layer
The pooling layer is the stage after the Convolutional Layer. The Pooling Layer consists of filters of a specific size and stride. Each shift depends on the number of strides that will be shifted in the entire activation map area. The most commonly used pooling layers are max pooling and average pooling. When using Max Pooling 2×2 with Stride 2, the value taken at each filter shift is the most significant. In comparison, Average Pooling will take the average value.

Read also : Image Classification Using Transfer Learning and CNN

RailU activation
Rectified Linear Unit (ReLU) is an activation function to provide network capabilities to perform non-linear tasks. This layer will not affect the receptive field of the convolutional layer.

Activate Softmax
Softmax activation is another form of Logistic Regression algorithm that can be used to classify three or more classes. The standard classification commonly used is binary class classification.

Fully Connected Layer
Fully Connected Layer to transform data dimensions so that they can be classified linearly. The fully connected layer receives input from the output pooling layer in the form of a feature map. Because the feature map is still a multidimensional array, it will reshape it and produce an n-dimensional vector, where n is the number of output classes the program must choose.

Read also : Classification of Batik Motifs Image Using Deep Learning Algorithms

Read also : What is Transfer Learning?

One of the frequently used CNN models is MobileNetV2

Architecture MobileNetV2

The MobileNet architecture is a lightweight convolutional neural network proposed by the Google team in 2017, focusing on mobile or embedded devices. MobileNetV2 is a development of MobileNetV1, using a depthwise separable convolution (DSP) technique by introducing inverted residuals and linear bottlenecks. The MobileNetV2 architecture is shown in Figure 2.9. The bottleneck component functions to change the concept from low to high in the input and output sections between the models. Simultaneously, the inner layer encapsulates the model’s functionality to transform input from lower-level concepts (pixels) to higher-level descriptors (image classifications). Therefore, like residual connections in traditional CNN architecture, between bottlenecks will make the training process faster and with better accuracy.

Tinggalkan Balasan

Alamat email Anda tidak akan dipublikasikan. Ruas yang wajib ditandai *