Ensemble method - Boosting (Regression)

Notes for lecture-21 by Shashank Shekhar

To avoid underfitting and overfitting in an algorithm, its bias and variance must be tuned. Boosting is such a method to reduce bias.

Ensemble methods are techniques that create multiple models and then combine them to produce improved results. Ensemble methods usually produces more accurate solutions as compared to a single model. Here "model" describes the output of the algorithm trained with data.

Boosting:

Boosting is the process through which weak learners are converted to strong learners. The main principle of boosting is that a sequence of weak learning models are fitted on a weighted version of dataset and the misclassified examples by previous learner are given more weight. The predictions are then combined through a weighted sum (for regression) to produce the final prediction. A weak learning algorithm is defined as the algorithm whose performance is at least slightly better than random chance.

Gradient boosting (for regression):

The gradient boosting algorithm uses the gradient descent to minimize the loss. It calculates the residual (difference between the current predicted and the known target value). After that gradient boosting regression trains a weak model that maps features to this residual which is then added to the existing model input and the process leads to the more accurate predicted value. Repeating this step improves the overall model prediction.

To implement gradient boosting regression, following steps are followed:

Select a weak learner

Use an additive model

Define a loss function

Minimize the loss function

Advantages of Gradient Boosting:

Better accuracy: Gradient Boosting Regression provides better accuracy when compared to other regression processes like Linear Regression, etc.

Less pre-processing: Data pre-processing is important in machine learning which affects our model accuracy when not done properly. Gradient Boosting Regression requires minimal data pre-processing (not mandatory here), thus making it faster to implement with lesser complexity.

Higher flexibility: Gradient Boosting Regression is highly flexible as it can be used with many hyper-parameters and loss functions.

Missing data: Gradient Boosting Regression handles the missing data (issue arises during training the model) on its own and does not require us to handle it explicitly.

Gradient Boosting Parameters:

Here, we will discuss some important parameters used in sklearn.ensemble.GradientBoostingRegression class.

Loss (loss): It represents the loss function and its default value is 'ls' which stands for least square regression.

Number of estimators (n_estimators): It represents the number of boosting stages to be performed by the model. Its default value is 100. Higher number of estimators usually results in better performance but increases the training time.

Maximum Depth (max_depth): It is the depth of the decision tree estimator in the gradient boosting regression. It default value is 3 but its value can depend on the input variables.

Learning rate (learning_rate): It is a hyper-parameter in gradient boosting regressor algorithm that determines the step size at each iteration while moving toward a minimum of a loss function.

Implementing Gradient Boosting Regression

I will be using data from this site. We have some y-value corresponding to some x-values. We will extract x and y from the csv file.

Now we are going to train the model. We have imported the GradientBoostingRegressor class from sklearn.ensemble. Then we are creating the instance gradient_br of the class GradientBoostingRegressor with some parameters and its value. After that we are calling the fit method on the model instance gradient_br. Here, we set the parameter values as n_estimators=5, max_depth=3 and learning_rate=1.

Now lets visualize the model that we have created. The graph shows predicted value vs x. It looks like a good fit. We have used pyplot to plot the graph.

Now, we will see how good model fits the data quantitatively.

We can see that our model's fitment score is around 99.44%. The Github link for the above code and the csv file is given here.

References:

Implemeting Gradient Boosting

Ensemble Learning to Improve Machine Learning Results

Ensemble Methods

Gradient Boosting Regressor