Simple Linear Regression

Flatlay of an iPad displaying stock market graph on a wooden desk with a pencil and paper.

In the realm of machine learning, one of the most basic yet powerful techniques for predicting continuous outcomes is Simple Linear Regression. This fundamental approach has been widely used in various fields, including economics, medicine, and finance, due to its simplicity and effectiveness in modeling relationships between variables. In this blog post, we will delve into the math behind Simple Linear Regression, discuss the code implementation using Python, and explore how to create a custom Simple Linear Regression class.

Mathematical Background

Simple Linear Regression is a linear model that predicts a continuous output variable based on one or more input features. The underlying mathematical concept is to find the best-fitting line that minimizes the sum of the squared errors (SSE) between observed responses and predicted values.

Let’s denote our response variable as y and input feature(s) as x. We aim to find the best linear combination of x that predicts y:

y = β0 + β1x + ε

where β0 is the intercept, β1 is the slope coefficient, and ε represents the random error term.

The goal is to estimate the values of β0 and β1 using a dataset. This can be achieved by minimizing the SSE function:

SSE(y, y_pred) = (y – y_pred)^2

We can rewrite this equation as follows:

SSE = Σ[(y_i – (β0 + β1x_i))^2]

To minimize SSE, we take the partial derivatives of the equation with respect to β0 and β1, set them to zero, and solve for these coefficients.

∂SSE/∂β0 = -2Σ[y_i – (β0 + β1x_i)] = 0

∂SSE/∂β1 = -2Σ[x_i(y_i – (β0 + β1x_i))] = 0

By solving these equations, we can obtain the estimates of β0 and β1.

**Python Implementation**

We will use Python’s NumPy library to implement the Simple Linear Regression model. First, let’s import the necessary libraries:

“`python
import numpy as np
“`

Next, we define a function that calculates the SSE for our dataset:

“`python
def calculate_sse(y_true, y_pred):
return np.sum((y_true – y_pred) ** 2)
“`

Now, let’s create a Simple Linear Regression class using Python:

“`python
class SimpleLinearRegression:
def __init__(self, alpha=0.1, num_iterations=10000):
self.alpha = alpha
self.num_iterations = num_iterations
self.weights = None

def fit(self, X, y):
n_samples, n_features = X.shape
self.weights = np.zeros(n_features + 1)

for _ in range(self.num_iterations):
y_pred = np.dot(X, self.weights[1:]) + self.weights[0]
dw = 2 * (X.T @ (y_pred – y)) / n_samples

self.weights -= self.alpha * dw
“`

In the `fit` method, we use gradient descent to update the weights. We calculate the partial derivatives of SSE with respect to each weight, divide by the number of samples, and multiply by a small learning rate (`alpha`). We then subtract this updated value from the current weight.

Using the SimpleLinearRegression Class

To demonstrate how to use our custom class, let’s create some sample data:

“`python
import numpy as np

# Generate random data
X = np.random.rand(100, 1)
y = 3 + 2 * X + np.random.randn(100, 1) / 1.5
“`

Now we can fit our model to this data using the following code:

“`python
slr_model = SimpleLinearRegression()
slr_model.fit(X, y)
print(slr_model.weights)
y_pred = slr_model.predict(X)
“`

In the `predict` method, we use the learned weights to make predictions on new input values.

Example Use Case: Predicting House Prices

Suppose we have a dataset of houses with their corresponding prices and features such as square footage, number of bedrooms, and location. We want to predict the price of a house based on these features using Simple Linear Regression. The code would look something like this:

“`python
import numpy as np

# Features
X = np.array([[1000, 2], [2000, 3], [3000, 4]]) # square footage and number of bedrooms
y = np.array([500000, 700000, 900000]) # house prices

slr_model = SimpleLinearRegression()
slr_model.fit(X, y)

# Predict the price of a new house with 1500 sqft and 2 bedrooms
X_new = np.array([[1500, 2]])
y_pred = slr_model.predict(X_new)
print(f”Predicted price: {y_pred[0][0]:.2f}”)
“`

In this example, we use our custom Simple Linear Regression class to predict the price of a new house with specific features.

Conclusion

Simple Linear Regression is a fundamental technique in machine learning that can be used for predicting continuous outcomes based on one or more input features. In this blog post, we explored the math behind Simple Linear Regression and provided an implementation using Python. We also demonstrated how to create a custom Simple Linear Regression class from scratch. This approach can be applied to various domains such as economics, medicine, and finance to model relationships between variables.

Remember that in practice, you will often encounter more complex data distributions, non-linear relationships, and interactions between features, which may require more advanced machine learning techniques like Polynomial Regression, Ridge Regression, or Lasso Regression. However, Simple Linear Regression remains a useful building block for more sophisticated models and is an excellent starting point for beginners to explore the world of linear regression.

References

* Hastie, T., Tibshirani, R., & Friedman, J. (2013). The elements of statistical learning: Data mining, inference, and prediction. Springer.
* James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with Python. MIT Press.

Leave a Comment

Your email address will not be published. Required fields are marked *