Blog

Category: Expert stories

Machine Learning Algorithm: Linear Regression

Linear regression is one of the most popular algorithms in both Statistics and Machine Learning. It is the simplest type of Machine Learning Algorithm.

The article on Linear Regression is the first of a set of blog articles that will go into this level of detail about different aspects of Machine Learning.

By Alan Lehane, Developer

Following on from my previous blog: Machine learning in theory:

Creating a full stack application with a Machine Learning Component hosted on Microsoft's Azure Platform

This article will go into more technical detail about a specific type of Machine Learning Algorithm: Linear Regression.

emagine's Technical Machine Learning Series:

Linear regression is one of the most popular algorithms in both Statistics and Machine Learning. It is the simplest type of Machine Learning Algorithm. This means that it is a good place to start. In this blog, I am going to explain how ML Algorithms work in general using Linear Regression as an example.

This type of algorithm is best used to predict trends or to find correlations in data sets. It is best trained using a supervised learning environment.

One would use a linear Regression algorithm to establish a linear connection between the input variables(x) and an output variable(y). For example, the output variable can be calculated using a linear combination of the input variables.

What does Training / Learning Mean?

When a linear regression algorithm learns/trains on a data set, basically the algorithm is trying to work out what the Coefficients of the Linear Equation should be.

Supervised Learning

Supervised learning refers to the type of dataset used to train the ML algorithm. In a supervised dataset, the output variable(y) of each piece of example data in the dataset is a known value. I will cover unsupervised learning in a future blog.

Example – Predicting house prices

If I am training an algorithm to predict house prices in my area, I would gather a dataset of houses that have previously been sold in my area.

I would gather as many features of the sold houses as I could, certain features are more important than others, but all features play some role so the more the better.

In our example, our goal is a linear equation that will predict the selling price of a house in the future.

When training our example the ML Model will iterate through the dataset, inserting the input values into the formula and the model will compare its predicted price to the actual price. If the predicted price and the actual price differ the model will adjust the coefficients and continue.

This process will continue for a set number of iterations or until an accuracy threshold has been achieved. The amount that the coefficient is altered depends on the type of Linear Regression Algorithm used.

Dealing with Non-Discrete Input Variables?

Typically, non-discrete (non-numerical) input or output variables are assigned a numeric value to represent each possible option of the variable.

For example: For the Color input variable, White=1, Yellow=2, Red=3 etc. For binary input variables like Garage: No=-1 & yes=1.

Types of Linear Regression

Ordinary Least Squares

Ordinary Least Squares attempts to minimize the sum of the square residuals. Imagine our dataset as a 2d Matrix of points where each point in the matrix is an example in our dataset. OLS attempts to draw a line through the matrix that has the shortest distance from the line to every point in the matrix.

Gradient Descent

With Gradient descent the coefficients are initially determined randomly, they are then adjusted depending on the learning rate (α), which is set by the user. The coefficient is adjusted in the direction of reducing the error.

Regularization

Regularization attempts to reduce the complexity of the model while also reducing the error at the same time. 2 examples of Regularization are:

Lasso Regression

Lasso Regression is a modified OLS that also minimizes the absolute sum of the coefficients.

Ridge Regression

Ridge Regression is also a modified OLS that minimizes the squared absolute sum of the coefficients.

Conclusion

In this blog, we learned:

Detailed Information about Linear Regression
Different types of Linear Regression
Machine Learning Terminology and Concepts

Alan Lehane, Software Developer
Alan has been working with Aspira for 4 years as a Software Developer, specialising in Data Analytics and Machine Learning. He has provided a wide variety of services to Aspira’s clients including Software Development, Test Automation, Data Analysis and Machine Learning.

Blog

More insights

Machine Learning Algorithm: Linear Regression

What does Training / Learning Mean?

Supervised Learning

Example – Predicting house prices

Dealing with Non-Discrete Input Variables?

Types of Linear Regression

Ordinary Least Squares

Gradient Descent

Regularization

Lasso Regression

Ridge Regression

Conclusion

More insights

IT leaders are reclaiming control – here’s why

Developments of AI in Project Management

From Sci-fi to Reality: The future of the automotive industry

emagine Group

Denmark

France

Germany

India

Ireland

Netherlands

Norway

Poland

Sweden

United Kingdom

United Arab Emirates