transparentai.utils
¶
This submodule contains utility functions for transparentai module.
Reports functions¶
-
transparentai.utils.reports.
generate_head_page
(document_title)[source]¶ Generate a figure with a given title.
Parameters: document_title (str) – Name of the document Returns: Document head figure Return type: matplotlib.figure.Figure
-
transparentai.utils.reports.
generate_validation_report
(model, X, y_true, X_valid=None, y_true_valid=None, metrics=None, model_type='classification', out='validation_report.pdf')[source]¶ Generate a pdf report on the model performance with the following graphics:
- First page with the report title
- An histogram of the y_true distribution
- Model performance plot
- Model feature importance plot
This function is usefull to keep a proof of the validation.
Parameters: - model – Model to analyse
- X (array like) – Features
- y_true (array like) – True labels
- X_valid (array like) – Features for validation set
- y_true_valid (array like (default None)) – True labels for validation set
- metrics (list (default None)) – List of metrics to plots
- model_type (str (default 'classification')) – ‘classification’ or ‘regression’
- out (str (default 'validation_report.pdf')) – path where to save the report
Raises: ValueError: – ‘model_type must be ‘classification’ or ‘regression’
Utility functions¶
-
transparentai.utils.utils.
encode_categorical_vars
(df)[source]¶ Encodes categorical variables from a dataframe to be numerical (discrete) It uses LabelEncoder classes from scikit-learn
Parameters: df (pd.DataFrame) – Dataframe to update Returns: - pd.DataFrame – Encoded dataframe
- dict – Encoders with feature name on keys and encoder as value
-
transparentai.utils.utils.
find_dtype
(arr, len_sample=1000)[source]¶ Find the general dtype of an array. Three possible dtypes :
- Number
- Datetime
- Object
Parameters: - arr (array-like) – Array to inspect
- len_sample (int (default, 1000)) – Number max of items to analyse if len_sample > len(arr) then use len(arr)
Returns: dtype string (‘number’, ‘datetime’ or ‘object’)
Return type: Raises: TypeError: – arr is not an array like
-
transparentai.utils.utils.
format_describe_str
(desc, max_len=20)[source]¶ Returns a formated list for the matplotlib table cellText argument.
Each element of the list is like this : [‘key ‘,’value ‘]
Number of space at the end of the value depends on len_max argument.
Parameters: Returns: Formated list for the matplotlib table cellText argument
Return type:
-
transparentai.utils.utils.
init_corr_matrix
(columns, index, fill_diag=1.0)[source]¶ Returns a matrix n by m fill of 0 (except on the diagonal if squared matrix) Recommended for correlation matrix
Parameters: - columns – list of column names
- index – list of index names
- fill_diag (float (default 1.)) – if squared matrix, then set diagonal with this value
Returns: Initialized matrix
Return type: pd.DataFrame
-
transparentai.utils.utils.
is_array_like
(obj, n_dims=1)[source]¶ Returns whether an object is an array like. Valid dtypes are list, np.ndarray, pd.Series, pd.DataFrame.
Parameters: - obj – Object to inspect
- n_dims (int (default 1)) – number of dimension accepted
Returns: Whether the object is an array like or not
Return type:
-
transparentai.utils.utils.
preprocess_metrics
(input_metrics, metrics_dict)[source]¶ Preprocess the inputed metrics so that it maps with the appropriate function in metrics_dict global variable.
input_metrics can have str or function. If it’s a string then it has to be a key from metrics_dict global variable dict
Returns a dictionnary with metric’s name as key and metric function as value
Parameters: Returns: Dictionnary with metric’s name as key and metric function as value
Return type: Raises: TypeError: – input_metrics must be a list