# Sentiment Analysis Using Linear Model Algorithm

Social media had a significant influence when the Covid 19 pandemic hit Indonesia and worldwide. The impact of the pandemic has also minimized daily and economic activities outside the home, thus making people accustomed to digital platforms, especially for buying products online. From the community’s behavior above, make social media one of the best online media to promote your product. When promoting digitally through social media, your content must attract the attention of social media users. So one of the things that can be done is to do sentiment analysis in addition to using data sets in the form of text taken from social media. For example, using social media, Twitter can use libraries from Twitter API and tweepy.

When the data has been retrieved, the data will then go through a preprocessing process using the Python programming language. The first stage is sorting out the columns to be used and cleaning the data in the Twitter text column. The next step is converting the data in the form of words into vectors so that it can be processed by a computer using word2vec and fastext with the gensim library to perform two similar word search processes, the first from Indonesian Wikipedia data and the second by making your dictionary. After the preprocessing of the data is complete and becomes a vector, the next step is to predict responses using a family of linear model algorithms, namely linear regression, bayesian ridge regression, lasso, and ridge regression algorithms. The model created can help business people get the right words to use as content during promotions so that branding can go according to plan and bring in more sales.

Word2Vec is based on the idea of deep learning, where words are represented in vectors. Word2Vec transforms document operations into vector computations in word vector space. Semantic relations in documents can be characterized based on the similarity of words in the vector space. The initial stage in the word2vec process is to build a vocabulary from the training text data and then learn the vector representation of a collection of words. The resulting vectors can be used as features for applications in natural language processing and machine learning cases. In addition, FastText is an open-source word embedding method developed by the Facebook Research Lab Team for classifying and vectorizing text which is a development of Word2Vec. This method studies word representations by considering subword information. Each word is represented as a set of n-gram characters.

Read also : Sentiment Analysis Using Levenshtein Distance

**Linear Regressions**

Linear Regression is the modeling and analysis of numerical data consisting of one or more independent variables and the value of the dependent variable. Linear regression fits a linear model with coefficients w = (w1, …, wp) to minimize the sum of the squared remainders between the observed targets in the data set and the targets predicted by the linear approximation. Mathematically it solves the shape problem:

Linear Regression will take an array of its X and y appropriate methods and store the linear model’s coefficients in its coefficients member.

Read also : Sentiment Analysis for Cyberbullying Activities of Teenagers on Social Media During the COVID-19 Pandemic

**Bayesian Ridge Regression**

Ridge estimates the probability model from the regression problem as described above. The Gaussian sphere gives priority to the coefficient w:

The priors over a and lambda were chosen as the gamma distribution, conjugating the priors for Gaussian precision. The resulting model is called Bayesian ridge regression and is similar to the classical ridge.

**Ridge Regression**

Ridge regression is a modification of the least squares method by adding a ridge parameter in determining the regression model’s weight value, which produces a biased estimator of the regression coefficient. Ridge regression reduces the impact of multicollinearity by selecting a biased estimator with a minor variance than the multiple linear regression estimators.

Where,

B = Coefficient of regression parameter

X = Variable causative factor

λ = ridge parameters (0 ≤ λ ≤ 1)

I = Identity matrix (p x p)

Y = Consequence variable

Read also : Introduction to Gamification and Video-Based Learning

**Lasso Regression**

Lasso is part of the model’s linear algorithm that estimates the sparse coefficient. This model is helpful in some contexts because it tends to choose a solution with fewer non-zero coefficients, effectively reducing the number of features on which a given answer depends. Under certain conditions, it can recover the exact set of non-zero coefficients. Mathematically, it consists of a linear model with additional regularization terms. The objective function to minimize is:

The lasso estimation solves the minimization of the least squares penalty with a||w||1 added, where a is a constant and ||w||1 is the l1 norm of the coefficient vector.