Blog

Machine_learning_image

Category: Expert stories

Machine Learning Algorithm: Linear Regression

Linear regression is one of the most popular algorithms in both Statistics and Machine Learning. It is the simplest type of Machine Learning Algorithm.

The article on Linear Regression is the first of a set of blog articles that will go into this level of detail about different aspects of Machine Learning.

By Alan Lehane, Developer

Following on from my previous blog: Machine learning in theory:

Creating a full stack application with a Machine Learning Component hosted on Microsoft's Azure Platform

This article will go into more technical detail about a specific type of Machine Learning Algorithm: Linear Regression.

Linear regression is one of the most popular algorithms in both Statistics and Machine Learning. It is the simplest type of Machine Learning Algorithm. This means that it is a good place to start. In this blog, I am going to explain how ML Algorithms work in general using Linear Regression as an example.

This type of algorithm is best used to predict trends or to find correlations in data sets. It is best trained using a supervised learning environment.

One would use a linear Regression algorithm to establish a linear connection between the input variables(x) and an output variable(y). For example, the output variable can be calculated using a linear combination of the input variables.

Linear_regression

What does Training / Learning Mean?

When a linear regression algorithm learns/trains on a data set, basically the algorithm is trying to work out what the Coefficients of the Linear Equation should be.

 

Supervised Learning

Supervised learning refers to the type of dataset used to train the ML algorithm. In a supervised dataset, the output variable(y) of each piece of example data in the dataset is a known value. I will cover unsupervised learning in a future blog.

 

Example – Predicting house prices

If I am training an algorithm to predict house prices in my area, I would gather a dataset of houses that have previously been sold in my area.

I would gather as many features of the sold houses as I could, certain features are more important than others, but all features play some role so the more the better.

Linear_regression_table

In our example, our goal is a linear equation that will predict the selling price of a house in the future.

Linear_regression_2

When training our example the ML Model will iterate through the dataset, inserting the input values into the formula and the model will compare its predicted price to the actual price. If the predicted price and the actual price differ the model will adjust the coefficients and continue.

This process will continue for a set number of iterations or until an accuracy threshold has been achieved. The amount that the coefficient is altered depends on the type of Linear Regression Algorithm used.

Dealing with Non-Discrete Input Variables?

Typically, non-discrete (non-numerical) input or output variables are assigned a numeric value to represent each possible option of the variable.

For example: For the Color input variable, White=1, Yellow=2, Red=3 etc. For binary input variables like Garage: No=-1 & yes=1.

Types of Linear Regression

 

Ordinary Least Squares

Ordinary Least Squares attempts to minimize the sum of the square residuals. Imagine our dataset as a 2d Matrix of points where each point in the matrix is an example in our dataset. OLS attempts to draw a line through the matrix that has the shortest distance from the line to every point in the matrix.

 

Gradient Descent

With Gradient descent the coefficients are initially determined randomly, they are then adjusted depending on the learning rate (α), which is set by the user. The coefficient is adjusted in the direction of reducing the error.

 

Regularization

Regularization attempts to reduce the complexity of the model while also reducing the error at the same time. 2 examples of Regularization are:

 

Lasso Regression

Lasso Regression is a modified OLS that also minimizes the absolute sum of the coefficients.

 

Ridge Regression

Ridge Regression is also a modified OLS that minimizes the squared absolute sum of the coefficients.

 

Conclusion

In this blog, we learned:

  • Detailed Information about Linear Regression
  • Different types of Linear Regression
  • Machine Learning Terminology and Concepts

Alan Lehane, Software Developer
Alan has been working with Aspira for 4 years as a Software Developer, specialising in Data Analytics and Machine Learning. He has provided a wide variety of services to Aspira’s clients including Software Development, Test Automation, Data Analysis and Machine Learning.

 

Blog

More insights

left-arrow
right-arrow

Two young professionals sitting in a meeting.
Expertise strategy
Tech & Development

IT leaders are reclaiming control – here’s why

The IT landscape is shifting fast. As old outsourcing models show their cracks, forward-thinking leaders are rebuilding internal strength and redefining what smart partnerships look like. A trend is emerging where decoupling and taking back ownership takes the stage.

The agile transformation of a large enterprise is a complex process that requires profound changes in leadership, structure, and corporate culture.
Expertise strategy
Projects & Implementation
Tech & Development

Developments of AI in Project Management

This article discusses how the integration of artificial intelligence (AI) and machine learning (ML) into project management is driving a cultural shift towards innovation and agility within organizations.

Tech & Development
Trends

From Sci-fi to Reality: The future of the automotive industry

Discover the future of the automotive industry with insights into the rise of autonomous driving, electric vehicles, and AI innovations. Learn about new regulations, industry challenges, and groundbreaking technologies transforming transportation and car manufacturing.