Explain Your ML Model with LIME in Python

Introduction

Machine learning models, especially ensemble methods like XGBoost, often act like black boxes—accurate, but difficult to interpret. That’s where LIME comes in! LIME stands for Local Interpretable Model-Agnostic Explanations and is a powerful tool to understand how features impact individual predictions.

In this blog, we’ll walk through:

A basic XGBoost model
Using LIME for interpretation
Understanding LIME functions
Why LIME matters in real-world scenarios

First, install the lime library using pip

pip install lime

Import Libraries

import lime
import lime.lime_tabular
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

Dataset: Breast Cancer Classification

We’ll use the Breast Cancer Wisconsin (Diagnostic) dataset available from sklearn.datasets. This dataset contains features computed from digitized images of breast mass tissue and is used to classify tumors as malignant or benign.

Data Description

Feature Name	Description
mean radius	Mean of distances from center to points on the perimeter
mean texture	Standard deviation of gray-scale values
mean perimeter	Mean size of the core tumor
mean area	Mean area of the tumor
mean smoothness	Local variation in radius lengths
…	…
target	Classification (0 = malignant, 1 = benign)

Note: The dataset includes 30 numeric features and a binary target variable.

Building a Simple XGBoost Classifier

# Load data
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = XGBClassifier()
model.fit(X_train, y_train)

# Evaluate
preds = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, preds))

Accuracy: 0.956140350877193

Why Do We Need LIME?

Although the model is accurate, we don’t know how it’s making decisions. For example:

Why did it predict malignant for a particular tumor?
Which features pushed the prediction toward malignant or benign?

This is where LIME helps.

LIME (Local Interpretable Model-Agnostic Explanations) explains individual predictions by:
Creating many similar samples around one instance
Observing how predictions change
Fitting a simple (interpretable) model locally
Showing which features most influenced the prediction

Using LIME to Explain XGBoost

# Initialize LIME explainer
explainer = lime.lime_tabular.LimeTabularExplainer(
    training_data=X_train.values,
    feature_names=X.columns.tolist(),
    class_names=data.target_names,
    mode='classification'
)

# Pick an instance to explain
i = 15
instance = X_test.iloc[i]
exp = explainer.explain_instance(
    data_row=instance.values,
    predict_fn=model.predict_proba
)

# Show explanation
exp.show_in_notebook(show_table=True)

What LIME Shows You

LIME explains how your model predicted this tumor as malignant with 100% confidence by showing which features most strongly influenced this specific prediction.

Left Panel: Feature Impact - The left chart shows which features influenced the malignant prediction:

Blue bars: Features that pushed the prediction toward malignant
Orange bar: Slightly supported benign

Top features for malignant:

worst concave points > 0.18 (+0.25)
worst area > 1031.50 (+0.24)
worst texture > 29.69 (+0.21)

Minor support for benign:

compactness error > 0.03 (−0.08)

Interpretation:

This tumor has very high values for features like worst area, worst texture, and worst concave points, which are known to be associated with malignancy. The model picked up on these and used them heavily in its decision.

Conclusion

XGBoost gives us high accuracy, but LIME gives us explainability. By zooming into individual predictions, we get a better understanding of which features drive outcomes, build trust in our models, and catch potential errors early.

Use LIME when you need to explain the output of a model that acts like a black box. It’s simple to use, powerful to apply, and essential in any real-world machine learning project.