import lime
import lime.lime_tabular
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
Explain Your ML Model with LIME in Python
Introduction
Machine learning models, especially ensemble methods like XGBoost, often act like black boxes—accurate, but difficult to interpret. That’s where LIME comes in! LIME stands for Local Interpretable Model-Agnostic Explanations and is a powerful tool to understand how features impact individual predictions.
In this blog, we’ll walk through:
A basic XGBoost model
Using LIME for interpretation
Understanding LIME functions
Why LIME matters in real-world scenarios
First, install the lime library using pip
pip install lime
Import Libraries
Dataset: Breast Cancer Classification
We’ll use the Breast Cancer Wisconsin (Diagnostic) dataset available from sklearn.datasets. This dataset contains features computed from digitized images of breast mass tissue and is used to classify tumors as malignant or benign.
Data Description
Feature Name | Description |
---|---|
mean radius | Mean of distances from center to points on the perimeter |
mean texture | Standard deviation of gray-scale values |
mean perimeter | Mean size of the core tumor |
mean area | Mean area of the tumor |
mean smoothness | Local variation in radius lengths |
… | … |
target | Classification (0 = malignant, 1 = benign) |
Note: The dataset includes 30 numeric features and a binary target variable.
Building a Simple XGBoost Classifier
# Load data
= load_breast_cancer()
data = pd.DataFrame(data.data, columns=data.feature_names)
X = data.target
y
# Train-test split
= train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_test, y_train, y_test
# Train model
= XGBClassifier()
model
model.fit(X_train, y_train)
# Evaluate
= model.predict(X_test)
preds print("Accuracy:", accuracy_score(y_test, preds))
Accuracy: 0.956140350877193
Why Do We Need LIME?
Although the model is accurate, we don’t know how it’s making decisions. For example:
Why did it predict malignant for a particular tumor?
Which features pushed the prediction toward malignant or benign?
This is where LIME helps.
LIME (Local Interpretable Model-Agnostic Explanations) explains individual predictions by:
Creating many similar samples around one instance
Observing how predictions change
Fitting a simple (interpretable) model locally
Showing which features most influenced the prediction
Using LIME to Explain XGBoost
# Initialize LIME explainer
= lime.lime_tabular.LimeTabularExplainer(
explainer =X_train.values,
training_data=X.columns.tolist(),
feature_names=data.target_names,
class_names='classification'
mode
)
# Pick an instance to explain
= 15
i = X_test.iloc[i]
instance = explainer.explain_instance(
exp =instance.values,
data_row=model.predict_proba
predict_fn
)
# Show explanation
=True) exp.show_in_notebook(show_table
What LIME Shows You
LIME explains how your model predicted this tumor as malignant with 100% confidence by showing which features most strongly influenced this specific prediction.
Left Panel: Feature Impact - The left chart shows which features influenced the malignant prediction:
Blue bars: Features that pushed the prediction toward malignant
Orange bar: Slightly supported benign
Top features for malignant:
worst concave points > 0.18 (+0.25)
worst area > 1031.50 (+0.24)
worst texture > 29.69 (+0.21)
Minor support for benign:
- compactness error > 0.03 (−0.08)
Interpretation:
This tumor has very high values for features like worst area, worst texture, and worst concave points, which are known to be associated with malignancy. The model picked up on these and used them heavily in its decision.
Conclusion
XGBoost gives us high accuracy, but LIME gives us explainability. By zooming into individual predictions, we get a better understanding of which features drive outcomes, build trust in our models, and catch potential errors early.
Use LIME when you need to explain the output of a model that acts like a black box. It’s simple to use, powerful to apply, and essential in any real-world machine learning project.