Ensemble Learning is a powerful method used in Machine Learning to improve model performance by combining multiple individual models. These individual models, also known as “base models” or “weak learners,” may have limitations such as high variance or high bias.

There are two main types of ensemble models:

  1. Homogeneous Ensemble: Homogeneous ensembles combine multiple base models of the same type. These models are trained independently on different subsets of the training data and their predictions are combined to make the final prediction.
  2. Heterogeneous Ensemble: Heterogeneous ensembles combine base models of different types. These models may have various architectures or algorithms, allowing them to capture diverse aspects of the data and make complementary predictions.

There are three types:

  • bagging
  • boosting
  • stacking

We will provide Python implementation examples for each of these models. We will use data about housing in California in 1990, a dataset containing information on housing prices and related factors for various geographical locations in California.

First, we need to load the data and do some basic preprocessing:

  • One Hot encoding of categorical features
  • Removal of missing values (low percentage of these values)
  • Data standardization
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Read dataset
df = pd.read_csv('/kaggle/input/housing/housing.csv')

# Encode categorical features
encoder = OneHotEncoder()
df_ohe =  encoder.fit_transform(df[['ocean_proximity']]).toarray()

# Merge OHE features with dataset
df = pd.concat((
    df_ohe), axis=1)
df.columns = [col.replace('<','less') for col in df.columns]
# Remove missing values
df = df.dropna()

# Split into train and test
df_train, df_test = train_test_split(
    df, test_size=0.2, random_state=42)

# Standarize data
scaler = StandardScaler()
df_train_scaled = pd.DataFrame(
df_test_scaled = pd.DataFrame(
# Split into labels and features
X_train = df_train_scaled.drop(columns=['median_house_value'])
y_train = df_train_scaled[['median_house_value']]
X_test = df_test_scaled.drop(columns=['median_house_value'])
y_test = df_test_scaled[['median_house_value']]

After this basic processing, we can start training and evaluating our models. Please keep in mind that while we would like to conduct a more extensive analysis, it is beyond the scope of this article.

Stay up-to-date with our latest articles!
Subscribe to our free newsletter by entering your email address below.


It stands for “bootstrap aggregating” and involves training multiple base models independently on different subsets of the training data. Each model is trained on a randomly sampled subset of the original data with replacement. The final prediction is obtained by aggregating the predictions of all the individual models, such as by majority voting for classification or averaging for regression.

Some examples of Bagging are:

  • Random Forest: A collection of decision trees where each tree is trained on a random subset of features and aggregated to make predictions.
  • Bagged SVM: Multiple Support Vector Machines (SVMs) trained on different subsets of the data to create an ensemble prediction.

Let’s see how Random Forest can be implemented in Python:

from sklearn.ensemble import RandomForestRegressor

# Create a Random Forest Regressor
model = RandomForestRegressor(n_estimators=10)

# Train the model
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)


Boosting is an ensemble technique that trains models sequentially, where each model tries to correct the mistakes made by the previous models. The models are trained on weighted versions of the training data, where the weights are adjusted based on the performance of the previous models. Boosting focuses more on samples that are misclassified or difficult to classify.

Two fundamental and commonly known examples of Boosting are:

  • AdaBoost: A boosting algorithm that combines weak learners (e.g., decision stumps) by iteratively focusing on misclassified samples and adjusting the weights to improve the overall prediction.
  • Gradient Boosting: Builds an ensemble of models, typically decision trees, where each subsequent model is trained to correct the mistakes made by the previous models.

We can show how to implement Gradient Boosting in Python:

from sklearn.ensemble import GradientBoostingRegressor

# Create a Gradient Boosting Regressor
model = GradientBoostingRegressor(n_estimators=100, 

# Train the model
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

In addition to these, a few more advanced and complex examples are:

  1. XGBoost: eXtreme Gradient Boosting (XGBoost) is an optimized gradient boosting algorithm known for its efficiency and performance. It includes enhancements such as regularization, parallel processing, and tree pruning to improve model accuracy.
  2. LightGBM: LightGBM is another gradient boosting framework that focuses on achieving high performance and efficiency. It uses techniques like leaf-wise tree growth and gradient-based one-sided sampling to speed up training and reduce memory usage.
  3. CatBoost: CatBoost is a gradient boosting algorithm that is specifically designed to handle categorical features effectively. It automatically handles categorical variables and provides improved accuracy without extensive preprocessing.

XGBoost has become a popular choice among Kaggle participants, which is why we also show an implementation in Python of this model.

import xgboost as xgb

# Create the XGBoost regressor
model = xgb.XGBRegressor(n_estimators=100, learning_rate=0.1)

# Train the model
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)


It involves training multiple base models and combining their predictions using another model called a “meta-model” or “stacker.” The predictions of the base models serve as input features for the meta-model, which learns to make the final prediction based on these inputs. Stacking allows the meta-model to leverage the diverse predictions of the base models and potentially uncover more complex relationships in the data.

A possible stacking model could involve a random forest and linear regression as base models, and another linear regression model as the meta-model. Check below the implementation in Python:

from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import StackingRegressor

# Define the base models
base_models = [
    ('random_forest', RandomForestRegressor(n_estimators=10)),
    ('linear_regression', LinearRegression())

# Define the meta-model
meta_model = LinearRegression()

# Create the Stacking Regressor
model = StackingRegressor(estimators=base_models, 

# Train the model
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)


These ensemble methods are widely used in practice and can significantly improve model performance, generalization, and robustness.

We can compare the previous models for this particular dataset and see how they perform. We will calculate the MSE, MAE and R2 scores:

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Calculate Mean Squared Error (MSE)
mse = mean_squared_error(y_test, predictions)

# Calculate Mean Absolute Error (MAE)
mae = mean_absolute_error(y_test, predictions)

# Calculate R2 score
r2 = r2_score(y_test, predictions)

We can see below the metrics for each model. As expected, XGBoost, the most widely used model outperforms the rest of the models.

Random Forest (Bagging)0.1970.2910.809
Gradient Boosting (Boosting)0.2420.3410.766
XGBoost (Boosting)0.1840.2860.822

Ensemble learning is a powerful technique that leverages the collective wisdom of multiple models to improve prediction accuracy and generalization. By combining the strengths of individual models or leveraging diverse algorithms, ensemble models can mitigate weaknesses and provide more robust and reliable predictions. They are widely used in various domains, such as machine learning competitions, data analysis, and real-world applications, to achieve superior performance and enhance decision-making capabilities.


Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *