Allonia Python model management

Allonia models are managed through the dedicated AleiaModel librarie. It will help you better manage all fonctions related to the development, training, evaluation & monitoring of your model.

The AleiaModel Python librarie can be used on following platform objects:

  • Notebooks

  • Modules

  • User-services

Below is a quick example of AleiaModel use to create, train, evaluate, & call prediction function of a sklearn classification model.

For more details about available functions & model metadata, you can use help(AleiaModel) at any time.

Import AleiaModel librarie

Code:

import aleiamodel
from aleiamodel import AleiaModel

Initialize the model

Code:

model = AleiaModel("model_titanic_svc_proba")

Define data to be used for model training

Code:

## Load the dataset
df= aleialib.s3.load_file("dataset/dataset.csv",handle_type=True)

## Cleaning & preprocessing
print("raw data")
print(df)
#df['Gender'].replace(['Female', 'Male'], [0, 1], inplace = True)
print("clean data")
print(df)

pred, targ = ["Weight", "Height", "Age"], ["Gender"]
print(pred)
print(targ)

Output example:

Loading notebooks/dataset/dataset.csv...
...success
raw data
     Age  Height  Weight  Gender
ID
1     25     5.5     120    Male
2     32     5.7     150    Male
3     21     5.2     110  Female
4     29     5.8     145    Male
5     34     5.5     155  Female
..   ...     ...     ...     ...
132   23     5.0     100    Male
133   26     5.6     135  Female
134   33     6.1     180    Male
135   28     5.7     145  Female
136   20     5.2     110    Male

[136 rows x 4 columns]
clean data
     Age  Height  Weight  Gender
ID
1     25     5.5     120    Male
2     32     5.7     150    Male
3     21     5.2     110  Female
4     29     5.8     145    Male
5     34     5.5     155  Female
..   ...     ...     ...     ...
132   23     5.0     100    Male
133   26     5.6     135  Female
134   33     6.1     180    Male
135   28     5.7     145  Female
136   20     5.2     110    Male

[136 rows x 4 columns]
['Weight', 'Height', 'Age']
['Gender']

Set model parameters & custom functions

Code:

import pandas as pd

model.add_validator("steven misin", "steven.misin@aleia.com", "admin")  # Mail notification is not yet available
model.raw_set = data_path
model.observations_set = data_path  # Should provide a different set of course, but hey, this is a test
model.set_set_sizes(0.1, 0.2)  # validation is 10% of raw set, test is 20% of raw set
model.set_variables(pred, targ)

if model.new:
    #if it's a proba predict (like SVC), we have to set the propability=true so the predict_proba will be the one wrapped for seldon)
    model.model = SVC(kernel='linear',probability=True)

#eval model
from sklearn.metrics import *

def get_metrics(x, y):
    accuracy = accuracy_score(x, y)

    cm = pd.DataFrame(
        columns=model.classes,
        index=model.classes,
        data=confusion_matrix(x, y, labels=model.classes)
    )

     return {"accuracy": accuracy, "cm": cm}

model.compute_metrics_function = get_metrics

#feature engineering

def feature_engineering(dfRaw):
    #dfRaw['Gender'].replace(['Female', 'Male'], [0, 1], inplace = True)
    derived = dfRaw
    return derived

model.feature_engineering_function = feature_engineering

Execute model training pipeline as a dry-run

Code:

model.health_check_set = df[:5]
model.health_check()

Output example:

Loading notebooks/dataset/dataset.csv...
...success
True

Execute model training pipeline

Code:

model.learn()

Output example:

Loading notebooks/dataset/dataset.csv...
...success
({'accuracy': 0.7142857142857143,
  'cm':         Female  Male
  Female       5     2
  Male         2     5},
{'accuracy': 0.5714285714285714,
  'cm':         Female  Male
  Female       8     4
  Male         8     8})

Get model training pipeline history

Code:

model.learnings_summary

Output example:

timestamp revision dataset_revision duration args kwargs results

2023-09-13 09:53:47.110302

1

{'raw_set': ('notebooks/dataset/dataset.csv', 4)}

0.31

()

{}

({'accuracy': 0.7142857142857143, 'cm': ['Fema…​

Save the model as new version & close current model instance

Code:

model.save()
model.close()

A new Aleia model will be available in the related "Model" section on the application.

Access to various model metadatas

Code:

#load last model
model = AleiaModel("model_titanic_svc_proba")
#load specific model version
#model = AleiaModel("model_titanic_svc_proba", revision=1) # or version=s3_version_uuid

#get model infos
print('Model usage history:')
print(model.open_by_history)
print('Model training summary:')
print(model.learnings_summary)
print('Model documentation:')
print(model.description)
print('Model data source:')
print(model.raw_set)
print('Model classes:')
print(model.classes)
model.close()

Output example:

Loading model model_titanic_svc_proba with version ab085f3e-bdcb-4cd1-9e26-a9cd974354b6 created on 2023-09-13T09:54:42.089696.
Loading notebooks/model/model_titanic_svc_proba/model.aleiamodel...
...success
Model usage history:
{'2023-09-13 09:52:35.048120': {'open_by': 'track 56769f41-ff16-4564-ab1e-2ff4fd2779aa in project b91620c6-23e0-4797-867e-d1ed1ea76873 by user f842e341-1564-44d3-9deb-8038fc3dacf9 in file 1749091114.py', 'closed_at': '2023-09-13 09:54:43.451763'}, '2023-09-13 09:54:57.351331': {'open_by': 'track 56769f41-ff16-4564-ab1e-2ff4fd2779aa in project b91620c6-23e0-4797-867e-d1ed1ea76873 by user f842e341-1564-44d3-9deb-8038fc3dacf9 in file 2389158426.py'}}
Model training summary:
                            revision  \
timestamp
2023-09-13 09:53:47.110302         1

                                                             dataset_revision  \
timestamp
2023-09-13 09:53:47.110302  {'raw_set': ('notebooks/dataset/dataset.csv', 4)}

                            duration args kwargs  \
timestamp
2023-09-13 09:53:47.110302      0.31   ()     {}

                                                                      results
timestamp
2023-09-13 09:53:47.110302  ({'accuracy': 0.7142857142857143, 'cm': ['Fema...
Model documentation:
{'Summary': 'MISSING -- Details about functional context, model objectives, and various stakeholders.', 'Status': 'MISSING -- Details about model lifecycle status. Is it still in experiment or is it live ?', 'Data': 'MISSING -- Details about data used by the model, through exploration to training. Also give details about annotation methodology that was potentially used to create the training dataset.', 'Ethic': 'MISSING -- Details about ethic studies done around model data, if existing biases were addressed or not (through synthetic data for example), and the process used (through ethic comity for example).', 'Training': 'MISSING -- Details about training frequency, data scope, and targeted lifecycle (hot or cold).', 'Explainability': 'MISSING -- Details about model explicability and potential libraries that are used to addressed this topic.', 'Tests': 'MISSING -- Details about test scenarios around the model, parameters control, data preparation, and expected values.', 'Functional validation': 'MISSING -- Details about the model functionnal validation and its methodology. Did it involve functional stakeholders doing annotation & validation campaigns and how was it done ?', 'Activations': 'MISSING -- Details about rules to control the model activation & deactivation on live environment.', 'Deployment checklist': 'MISSING -- List of requirements to check before being officially able to trigger a new model deployment.', 'Architecture': {'Model': 'SVC', 'Requirements': ['re==2.2.1', 'json==2.0.9', 'platform==1.0.8', '_ctypes==1.1.0', 'ctypes==1.1.0', 'zmq.sugar.version==25.1.1', 'zmq.sugar==25.1.1', 'zmq==25.1.1', 'logging==0.5.1.2', 'traitlets._version==5.9.0', 'traitlets==5.9.0', 'zlib==1.0', "_curses==b'2.2'", 'socketserver==0.4', 'argparse==1.1', 'dateutil==2.8.1', 'six==1.15.0', '_decimal==1.70', 'decimal==1.70', 'platformdirs.version==3.10.0', 'platformdirs==3.10.0', '_csv==1.0', 'csv==1.0', 'executing.version==1.2.0', 'executing==1.2.0', 'pure_eval.version==0.2.2', 'pure_eval==0.2.2', 'stack_data.version==0.6.2', 'stack_data==0.6.2', 'pygments==2.16.1', 'ptyprocess==0.7.0', 'pexpect==4.8.0', 'pickleshare==0.7.5', 'backcall==0.2.0', 'decorator==5.1.1', 'wcwidth==0.2.5', 'prompt_toolkit==3.0.39', 'parso==0.8.3', 'jedi==0.19.0', 'urllib.request==3.9', 'comm==0.1.4', 'psutil==5.9.5', 'xmlrpc.client==3.9', 'http.server==0.6', 'pkg_resources._vendor.more_itertools==9.1.0', 'pkg_resources.extern.more_itertools==9.1.0', 'pkg_resources._vendor.platformdirs.version==2.6.2', 'pkg_resources._vendor.platformdirs==2.6.2', 'pkg_resources.extern.platformdirs==2.6.2', 'pkg_resources._vendor.packaging==23.1', 'pkg_resources.extern.packaging==23.1', '_pydevd_frame_eval.vendored.bytecode==0.13.0.dev', '_pydev_bundle.fsnotify==0.1.5', 'pydevd==2.9.5', 'packaging==23.1', 'numpy.version==1.24.4', 'numpy.core._multiarray_umath==3.1', 'numpy.core==1.24.4', 'numpy.linalg._umath_linalg==0.1.5', 'numpy.lib==1.24.4', 'numpy==1.24.4', 'scipy==1.11.2', 'scipy.sparse.linalg._isolve._iterative==1.21.6', 'scipy._lib.decorator==4.0.5', 'scipy.linalg._fblas==1.21.6', 'scipy.linalg._flapack==1.21.6', 'scipy.linalg._flinalg==1.21.6', 'scipy.sparse.linalg._eigen.arpack._arpack==1.21.6', 'setuptools._distutils==3.9.18', 'setuptools.version==68.0.0', 'setuptools._vendor.packaging==23.1', 'setuptools.extern.packaging==23.1', 'setuptools._vendor.ordered_set==3.1', 'setuptools.extern.ordered_set==3.1', 'setuptools._vendor.more_itertools==8.8.0', 'setuptools.extern.more_itertools==8.8.0', 'setuptools==68.0.0', 'distutils==3.9.18', 'joblib.externals.cloudpickle==2.0.0', 'joblib.externals.loky==3.0.0', 'joblib==1.1.0', 'sklearn.utils._joblib==1.1.0', 'scipy.special._specfun==1.21.6', 'scipy.optimize._minpack2==1.21.6', 'scipy.optimize._lbfgsb==1.21.6', 'scipy.optimize._cobyla==1.21.6', 'scipy.optimize._slsqp==1.21.6', 'scipy.optimize.__nnls==1.21.6', 'scipy.linalg._interpolative==1.21.6', 'scipy.integrate._vode==1.21.6', 'scipy.integrate._dop==1.21.6', 'scipy.integrate._lsoda==1.21.6', 'scipy.interpolate.dfitpack==1.21.6', 'scipy._lib._uarray==0.8.8.dev0+aa94c5a4.scipy', 'scipy.stats._statlib==1.21.6', 'scipy.stats._mvn==1.21.6', 'threadpoolctl==3.2.0', 'sklearn.base==1.3.0', 'sklearn.utils._show_versions==1.3.0', 'sklearn==1.3.0', 'PIL._version==10.0.0', 'PIL==10.0.0', 'defusedxml==0.7.1', 'cffi==1.15.1', 'PIL.Image==10.0.0', 'pyparsing==3.1.1', 'cycler==0.10.0', 'kiwisolver._cext==1.4.4', 'kiwisolver==1.4.4', 'matplotlib==3.4.3', 'cloudpickle==2.2.1', 'dill.__info__==0.3.6', 'dill==0.3.6', 'botocore==1.20.106', 'botocore.vendored.six==1.10.0', 'botocore.vendored.requests.packages.urllib3==1.10.4', 'urllib3.packages.six==1.12.0', 'urllib3._version==1.25.11', 'urllib3==1.25.11', 'cgi==2.6', 'cryptography.__about__==41.0.3', 'cryptography==41.0.3', '_cffi_backend==1.15.1', 'ipaddress==1.0', 'OpenSSL.version==23.2.0', 'OpenSSL==23.2.0', 'certifi==2020.06.20', 'jmespath==0.10.0', 'botocore.docs.bcdoc==0.16.0', 'botocore.session==1.20.106', 'boto3==1.17.106', 'pytz==2021.1', 'pyarrow._generated_version==12.0.1', 'pyarrow==12.0.1', 'pandas==1.3.0', 'chardet.version==3.0.4', 'chardet==3.0.4', 'charset_normalizer.version==3.2.0', 'charset_normalizer==3.2.0', 'requests.packages.urllib3.packages.six==1.12.0', 'requests.packages.urllib3._version==1.25.11', 'requests.packages.urllib3==1.25.11', 'idna.package_data==2.10', 'idna.idnadata==13.0.0', 'idna==2.10', 'requests.packages.idna.package_data==2.10', 'requests.packages.idna.idnadata==13.0.0', 'requests.packages.idna==2.10', 'requests.packages.chardet==3.0.4', 'requests.__version__==2.28.2', 'requests.utils==2.28.2', 'requests==2.28.2', 'werkzeug==2.0.3', 'click==8.0.4', 'asgiref==3.5.0', 'yaml==6.0.1', 'uvicorn==0.17.5', 'a2wsgi==1.4.0', 'greenlet==1.1.2', 'sqlalchemy==1.4.49', 'pymongo.pool==4.2.0', 'pymongo==4.2.0', 's3transfer==0.4.2']}, 'Technical performances':                             Duration (s)
Feature engineering            0.0+/-nan
Train-validation-test split    0.0+/-nan
Training                      0.07+/-nan
Validation                     0.0+/-nan
Test                           0.0+/-nan
Prediction                     nan+/-nan
Postprocess                    nan+/-nan, 'Evaluation': {'Validation metrics': {'accuracy': 0.7142857142857143, 'cm':         Female  Male
Female       5     2
Male         2     5}, 'Test metrics': {'accuracy': 0.5714285714285714, 'cm':         Female  Male
Female       8     4
Male         8     8}}}
Model data source:
Loading notebooks/dataset/dataset.csv...
...success
     Age  Height  Weight  Gender
ID
1     25     5.5     120    Male
2     32     5.7     150    Male
3     21     5.2     110  Female
4     29     5.8     145    Male
5     34     5.5     155  Female
..   ...     ...     ...     ...
132   23     5.0     100    Male
133   26     5.6     135  Female
134   33     6.1     180    Male
135   28     5.7     145  Female
136   20     5.2     110    Male

[136 rows x 4 columns]
Model classes:
('Female', 'Male')
Deleted file notebooks/model/model_titanic_svc_proba/model.lock.

Call model predict function with full pipeline

Code:

#predict an existing model with full default pipeline
dfPredict = aleialib.s3.load_file("dataset/dataset.csv",handle_type=True)
#model = AleiaModel("model_titanic_svc_proba", read_only=True, revision=1)
model = AleiaModel("model_titanic_svc_proba", read_only=True)
model.observations_set = dfPredict
results = model.apply()
print(results)
model.close()

Output example:

Loading notebooks/dataset/dataset.csv...
...success
Loading model model_titanic_svc_proba with version ab085f3e-bdcb-4cd1-9e26-a9cd974354b6 created on 2023-09-13T09:54:42.089696.
Loading notebooks/model/model_titanic_svc_proba/model.aleiamodel...
...success
['Male' 'Male' 'Male' 'Male' 'Female' 'Male' 'Female' 'Male' 'Male' 'Male'
'Female' 'Male' 'Male' 'Female' 'Male' 'Male' 'Female' 'Male' 'Female'
'Female' 'Male' 'Male' 'Male' 'Male' 'Female' 'Male' 'Female' 'Male'
'Male' 'Male' 'Female' 'Male' 'Male' 'Female' 'Male' 'Male' 'Female'
'Male' 'Female' 'Female' 'Male' 'Male' 'Male' 'Male' 'Female' 'Male'
'Female' 'Male' 'Male' 'Male' 'Female' 'Male' 'Male' 'Female' 'Male'
'Male' 'Female' 'Male' 'Female' 'Female' 'Male' 'Male' 'Male' 'Male'
'Female' 'Male' 'Female' 'Male' 'Male' 'Male' 'Female' 'Male' 'Male'
'Female' 'Male' 'Male' 'Female' 'Male' 'Female' 'Female' 'Male' 'Male'
'Male' 'Male' 'Female' 'Male' 'Female' 'Male' 'Male' 'Male' 'Female'
'Male' 'Male' 'Female' 'Male' 'Male' 'Female' 'Male' 'Female' 'Female'
'Male' 'Male' 'Male' 'Male' 'Female' 'Male' 'Female' 'Male' 'Male' 'Male'
'Female' 'Male' 'Male' 'Female' 'Male' 'Male' 'Female' 'Male' 'Female'
'Female' 'Male' 'Male' 'Male' 'Male' 'Female' 'Male' 'Female' 'Male'
'Male' 'Male' 'Female' 'Male' 'Male' 'Female' 'Male' 'Male']
Deleted file notebooks/model/model_titanic_svc_proba/model.lock.