This notebook is designed to demonstrate (and so document) how to use the
shap.dependence_plot function. It uses an XGBoost model trained on the classic UCI adult income dataset (which is classification task to predict if people made over 50k in the 90s).
import xgboost import shap # train XGBoost model X,y = shap.datasets.adult() model = xgboost.XGBClassifier().fit(X, y) # compute SHAP values explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(X)
A dependence plot is a scatter plot that shows the effect a single feature has on the predictions made by the model. In this example the log-odds of making over 50k increases significantly between age 20 and 40.
# The first argument is the index of the feature we want to plot # The second argument is the matrix of SHAP values (it is the same shape as the data matrix) # The third argument is the data matrix (a pandas dataframe or numpy array) shap.dependence_plot(0, shap_values, X)
# If we pass a numpy array instead of a data frame then we # need pass the feature names in separately shap.dependence_plot(0, shap_values, X.values, feature_names=X.columns)
# We can pass a feature name instead of an index shap.dependence_plot("Age", shap_values, X)
# We can also use the special "rank(i)" systax to specify the i'th most # important feature to the model. As measured by: np.abs(shap_values).mean(0) # In this example age is the second most important feature. shap.dependence_plot("rank(1)", shap_values, X)
# The interaction_index argument can be used to explicitly # set which feature gets used for coloring shap.dependence_plot("rank(1)", shap_values, X, interaction_index="Education-Num")
# we can turn off interaction coloring shap.dependence_plot("Age", shap_values, X, interaction_index=None)