transparentai.plots
¶
All ploting functions in different submodules.
Common plots functions¶
Datasets variable plots functions¶
-
transparentai.datasets.variable.variable_plots.
plot_datetime_var
(ax, arr, color='#3498db', label=None, alpha=1.0)[source]¶ Plots a line plot into an matplotlib axe.
Parameters: Raises: - TypeError: – arr is not an array like
- TypeError: – arr is not a datetime array
-
transparentai.datasets.variable.variable_plots.
plot_number_var
(ax, arr, color='#3498db', label=None, alpha=1.0)[source]¶ Plots an histogram into an matplotlib axe.
Parameters: Raises: - TypeError: – arr is not an array like
- TypeError: – arr is not a number array
-
transparentai.datasets.variable.variable_plots.
plot_object_var
(ax, arr, top=10, color='#3498db', label=None, alpha=1.0)[source]¶ Plots a bar plot into an matplotlib axe.
Parameters: Raises: - TypeError: – arr is not an array like
- TypeError: – arr is not a object array
-
transparentai.datasets.variable.variable_plots.
plot_table_describe
(ax, cell_text)[source]¶ Insert a table in a matplotlib graphic using an axis.
Parameters:
-
transparentai.datasets.variable.variable_plots.
plot_variable
(arr, legend=None, colors=None, xlog=False, ylog=False, **kwargs)[source]¶ Plots a graph with two parts given an array. First part is the plot custom plot depending on the array dtype. Second part is the describe statistics table.
First plot is:
- Histogram if dtype is number (using plot_number_var)
- Line plot if dtype is datetime (using plot_datetime_var)
- Bar plot if dtype is object (using plot_object_var)
If legend array is set then automaticly plots differents values.
Parameters: - arr (array like) – Array of values to plots
- legend (array like (default None)) – Array of values of legend (same length than arr)
- colors (list (default None)) – Array of colors, used if legend is set
- xlog (bool (default False)) – Scale xaxis in log scale
- ylog (bool (default False)) – Scale yaxis in log scale
Raises: - TypeError: – arr is not an array like
- TypeError: – legend is not an array like
- ValueError: – arr and legend have not the same length
Classification plots functions¶
-
transparentai.models.classification.classification_plots.
compute_prob_performance
(y_true, y_prob, metrics)[source]¶ Computes performance that require probabilities
Parameters: - y_true (array like) – True labels
- y_pred (array like) – Predicted labels
- metrics (list) – List of metrics to compute
Returns: Dictionnary of metrics computed that requires probabilities. If no metrics need those then it returns None
Return type: Raises: TypeError: – metrics must be a list
-
transparentai.models.classification.classification_plots.
plot_confusion_matrix
(confusion_matrix)[source]¶ Show confusion matrix.
Parameters: confusion_matrix (array) – confusion_matrix metrics result
-
transparentai.models.classification.classification_plots.
plot_performance
(y_true, y_pred, y_true_valid=None, y_pred_valid=None, metrics=None, **kwargs)[source]¶ Plots the performance of a classifier. You can use the metrics of your choice with the metrics argument
Can compare train and validation set.
Parameters: - y_true (array like) – True labels
- y_pred (array like (1D or 2D)) – if 1D array Predicted labels, if 2D array probabilities (returns of a predict_proba function)
- y_true_valid (array like (default None)) – True labels
- y_pred_valid (array like (1D or 2D) (default None)) – if 1D array Predicted labels, if 2D array probabilities (returns of a predict_proba function)
- metrics (list (default None)) – List of metrics to plots
Raises: TypeError: – if metrics is set it must be a list
-
transparentai.models.classification.classification_plots.
plot_roc_curve
(roc_curve, roc_auc)[source]¶ Show a roc curve plot with roc_auc score on the legend.
Parameters: - roc_curve (array) – roc_curve metrics result for each class
- roc_auc (array) – roc_auc metrics result for each class
-
transparentai.models.classification.classification_plots.
plot_score_function
(perf, perf_prob, metric)[source]¶ Plots score with a specific function.
E.g. confusion_matrix or roc_auc
Parameters: Raises: ValueError: – metric does not have a plot function
-
transparentai.models.classification.classification_plots.
plot_table_score_clf
(perf)[source]¶ Insert a table of scores on a matplotlib graphic for a classifier
Parameters: perf (dict) – Dictionnary with computed score
-
transparentai.models.classification.classification_plots.
preprocess_scores
(y_pred)[source]¶ Preprocess y_pred for plot_performance function.
if y_pred is probabilities then y_pred become predicted class, y_prob is the probabilities else, y_prob is None
Parameters: y_pred (array like (1D or 2D)) – if 1D array Predicted labels, if 2D array probabilities (returns of a predict_proba function) Returns: - np.ndarray – array with predicted labels
- np.ndarray – array with probabilities if available else None
- int – number of classes
Regression plots functions¶
-
transparentai.models.regression.regression_plots.
plot_error_distribution
(errors)[source]¶ Plots the error distribution with standard deviation, mean and median.
The error is calculated by the following formula :
\[error = y - \hat{y}\]Parameters: errors (array like) – Errors of a regressor
-
transparentai.models.regression.regression_plots.
plot_performance
(y_true, y_pred, y_true_valid=None, y_pred_valid=None, metrics=None, **kwargs)[source]¶ Plots the performance of a regressor. You can use the metrics of your choice with the metrics argument
Can compare train and validation set.
Parameters: - y_true (array like) – True target values
- y_pred (array like) – Predicted values
- y_true_valid (array like (default None)) – True target values for validation set
- y_pred_valid (array like (1D or 2D) (default None)) – Predicted values for validation set
- metrics (list) – List of metrics to plots
Raises: TypeError: – if metrics is set it must be a list
Explainer plots functions¶
-
transparentai.models.explainers.explainer_plots.
plot_global_feature_influence
(feat_importance, color='#3498db', **kwargs)[source]¶ Display global feature influence sorted.
Parameters: feat_importance (pd.Series) – Feature importance with feature as indexes and shap value as values
-
transparentai.models.explainers.explainer_plots.
plot_local_feature_influence
(feat_importance, base_value, pred, pred_class=None, **kwargs)[source]¶ Display local feature influence sorted for a specific prediction.
Parameters: - feat_importance (pd.Series) – Feature importance with feature as indexes and shap value as values
- base_value (number) – prediction value if we don’t put any feature into the model
- pred (number) – predicted value
Fairness plots functions¶
-
transparentai.fairness.fairness_plots.
format_priv_text
(values, max_char)[source]¶ Formats privileged (or unprivileged) values text so that it can be shown.
Parameters: Returns: Formated string for given values
Return type: Raises: TypeError
– values must be a list
-
transparentai.fairness.fairness_plots.
get_protected_attr_values
(attr, df, privileged_group, privileged=True)[source]¶ Retrieves all values given the privileged_group argument.
If privileged is True and privileged_group[attr] is a list then it returns the list, if it’s a function then values of df[attr] for which the function returns True.
If privileged is False and privileged_group[attr] is a list then it returns values of df[attr] not in the list else if it’s a function returns values of df[attr] for which the function returns False.
Parameters: - attr (str) – Protected attribute which is a key of the privileged_group dictionnary
- df (pd.DataFrame) – Dataframe to extract privilieged group from.
- privileged_group (dict) – Dictionnary with protected attribute as key (e.g. age or gender) and a list of favorable value (like [‘Male’]) or a function returning a boolean corresponding to a privileged group
- privileged (bool (default True)) – Boolean prescribing whether to condition this metric on the privileged_groups, if True, or the unprivileged_groups, if False.
Returns: List of privileged values of the protected attribyte attr if privileged is True else unprivileged values
Return type: Raises: ValueError: – attr must be in privileged_group
-
transparentai.fairness.fairness_plots.
plot_attr_title
(ax, attr, df, privileged_group)[source]¶ Plots the protected attribute titles with :
- The attribute name (e.g. Gender)
- Priviliged and unprivileged values
- Number of privileged and unprivileged values
Parameters: - ax (plt.axes.Axes) – axe where to add the plot
- attr (str) – Protected attribute which is a key of the privileged_group dictionnary
- df (pd.DataFrame) – Dataframe to extract privilieged group from.
- privileged_group (dict) – Dictionnary with protected attribute as key (e.g. age or gender) and a list of favorable value (like [‘Male’]) or a function returning a boolean corresponding to a privileged group
Raises: - ValueError: – attr must be in df columns
- ValueError: – attr must be in privileged_group keys
-
transparentai.fairness.fairness_plots.
plot_bias
(y_true, y_pred, df, privileged_group, pos_label=1, regr_split=None, with_text=True, **kwargs)[source]¶ Plots the fairness metrics for protected attributes refered in the privileged_group argument.
It uses the 4 fairness function :
- statistical_parity_difference
- disparate_impact
- equal_opportunity_difference
- average_odds_difference
You can also use it for a regression problem. You can set a value in the regr_split argument so it converts it to a binary classification problem. To use the mean use ‘mean’. If the favorable label is more than the split value set pos_label argument to 1 else to 0.
Example
Using this function for a binary classifier:
>>> from transparentai.datasets import load_adult >>> from sklearn.ensemble import RandomForestClassifier
>>> data = load_adult() >>> X, Y = data.drop(columns='income'), data['income'].replace({'>50K':1, '<=50K':0}) >>> X = X.select_dtypes('number') >>> clf = RandomForestClassifier().fit(X,Y) >>> y_pred = clf.predict(X)
>>> privileged_group = { 'gender':['Male'] }
>>> y_pred = clf.predict(X)plot_bias(Y, y_pred, data, privileged_group, with_text=True)
Parameters: - y_true (array like) – True labels
- y_pred (array like) – Predicted labels
- df (pd.DataFrame) – Dataframe to extract privilieged group from.
- privileged_group (dict) – Dictionnary with protected attribute as key (e.g. age or gender) and a list of favorable value (like [‘Male’]) or a function returning a boolean corresponding to a privileged group
- pos_label (number) – The label of the positive class.
- regr_split ('mean' or number (default None)) – If it’s a regression problem then you can convert result to a binary classification using ‘mean’ or a choosen number. both y_true and y_pred become 0 and 1 : 0 if it’s equal or less than the split value (the average if ‘mean’) and 1 if more. If the favorable label is more than the split value set pos_label=1 else pos_label=0
- with_text (bool (default True)) – Whether it displays the explanation text for fairness metrics.
-
transparentai.fairness.fairness_plots.
plot_bias_one_attr
(ax, metric, score)[source]¶ Plots bias metric score bar with the indication if it’s considered not fair or fair.
Parameters: - ax (plt.axes.Axes) – axe where to add the plot
- metric (str) – The name of the metric
- score (float:) – Score value of the metric
-
transparentai.fairness.fairness_plots.
plot_fairness_text
(ax, score, metric)[source]¶ Plots bias metric explanation text.
The text is retrieved by the fairness_metrics_text() function.
Parameters: - ax (plt.axes.Axes) – axe where to add the plot
- metric (str) – The name of the metric
- score (float:) – Score value of the metric
Monitoring plots functions¶
-
transparentai.monitoring.monitoring_plots.
plot_monitoring
(y_true, y_pred, timestamp=None, interval='month', metrics=None, classification=False, **kwargs)[source]¶ Plots model performance over a timestamp array which represent the date or timestamp of the prediction.
If timestamp is None or interval then it just compute the metrics on all the predictions.
If interval is not None it can be one of the following : ‘year’, ‘month’, ‘day’ or ‘hour’.
- ‘year’ : format ‘%Y’
- ‘month’ : format ‘%Y-%m’
- ‘day’ : format ‘%Y-%m-%d’
- ‘hour’ : format ‘%Y-%m-%d-%r’
If it’s for a classification and you’re using y_pred as probabilities don’t forget to pass the classification=True argument !
You can use your choosing metrics. for that refer to the evaluation metrics documentation.
Parameters: - y_true (array like) – True labels
- y_pred (array like (1D or 2D)) – if 1D array Predicted labels, if 2D array probabilities (returns of a predict_proba function)
- timestamp (array like or None (default None)) – Array of datetime when the prediction occured
- interval (str or None (default 'month')) – interval to format the timestamp with
- metrics (list (default None)) – List of metrics to compute
- classification (bool (default True)) – Whether the ML task is a classification or not