transparentai.utils

This submodule contains utility functions for transparentai module.

Reports functions

transparentai.utils.reports.generate_head_page(document_title)[source]

Generate a figure with a given title.

Parameters:document_title (str) – Name of the document
Returns:Document head figure
Return type:matplotlib.figure.Figure
transparentai.utils.reports.generate_validation_report(model, X, y_true, X_valid=None, y_true_valid=None, metrics=None, model_type='classification', out='validation_report.pdf')[source]

Generate a pdf report on the model performance with the following graphics:

  • First page with the report title
  • An histogram of the y_true distribution
  • Model performance plot
  • Model feature importance plot

This function is usefull to keep a proof of the validation.

Parameters:
  • model – Model to analyse
  • X (array like) – Features
  • y_true (array like) – True labels
  • X_valid (array like) – Features for validation set
  • y_true_valid (array like (default None)) – True labels for validation set
  • metrics (list (default None)) – List of metrics to plots
  • model_type (str (default 'classification')) – ‘classification’ or ‘regression’
  • out (str (default 'validation_report.pdf')) – path where to save the report
Raises:

ValueError: – ‘model_type must be ‘classification’ or ‘regression’

Utility functions

transparentai.utils.utils.encode_categorical_vars(df)[source]

Encodes categorical variables from a dataframe to be numerical (discrete) It uses LabelEncoder classes from scikit-learn

Parameters:df (pd.DataFrame) – Dataframe to update
Returns:
  • pd.DataFrame – Encoded dataframe
  • dict – Encoders with feature name on keys and encoder as value
transparentai.utils.utils.find_dtype(arr, len_sample=1000)[source]

Find the general dtype of an array. Three possible dtypes :

  • Number
  • Datetime
  • Object
Parameters:
  • arr (array-like) – Array to inspect
  • len_sample (int (default, 1000)) – Number max of items to analyse if len_sample > len(arr) then use len(arr)
Returns:

dtype string (‘number’, ‘datetime’ or ‘object’)

Return type:

str

Raises:

TypeError: – arr is not an array like

transparentai.utils.utils.format_describe_str(desc, max_len=20)[source]

Returns a formated list for the matplotlib table cellText argument.

Each element of the list is like this : [‘key ‘,’value ‘]

Number of space at the end of the value depends on len_max argument.

Parameters:
  • desc (dict) – Dictionnary returned by the variable.describe function
  • len_max (int (default 20)) – Maximum length for the values
Returns:

Formated list for the matplotlib table cellText argument

Return type:

list(list)

transparentai.utils.utils.init_corr_matrix(columns, index, fill_diag=1.0)[source]

Returns a matrix n by m fill of 0 (except on the diagonal if squared matrix) Recommended for correlation matrix

Parameters:
  • columns – list of column names
  • index – list of index names
  • fill_diag (float (default 1.)) – if squared matrix, then set diagonal with this value
Returns:

Initialized matrix

Return type:

pd.DataFrame

transparentai.utils.utils.is_array_like(obj, n_dims=1)[source]

Returns whether an object is an array like. Valid dtypes are list, np.ndarray, pd.Series, pd.DataFrame.

Parameters:
  • obj – Object to inspect
  • n_dims (int (default 1)) – number of dimension accepted
Returns:

Whether the object is an array like or not

Return type:

bool

transparentai.utils.utils.preprocess_metrics(input_metrics, metrics_dict)[source]

Preprocess the inputed metrics so that it maps with the appropriate function in metrics_dict global variable.

input_metrics can have str or function. If it’s a string then it has to be a key from metrics_dict global variable dict

Returns a dictionnary with metric’s name as key and metric function as value

Parameters:
  • input_metrics (list) – List of metrics to compute
  • metrics_dict (dict) – Dictionnary to compare input_metrics with
Returns:

Dictionnary with metric’s name as key and metric function as value

Return type:

dict

Raises:

TypeError: – input_metrics must be a list