Allonia Python model management
Allonia models are managed through the dedicated AleiaModel librarie. It will help you better manage all fonctions related to the development, training, evaluation & monitoring of your model.
The AleiaModel Python librarie can be used on following platform objects:
-
Notebooks
-
Modules
-
User-services
Below is a quick example of AleiaModel use to create, train, evaluate, & call prediction function of a sklearn classification model.
For more details about available functions & model metadata, you can use help(AleiaModel) at any time.
Define data to be used for model training
Code:
## Load the dataset
df= aleialib.s3.load_file("dataset/dataset.csv",handle_type=True)
## Cleaning & preprocessing
print("raw data")
print(df)
#df['Gender'].replace(['Female', 'Male'], [0, 1], inplace = True)
print("clean data")
print(df)
pred, targ = ["Weight", "Height", "Age"], ["Gender"]
print(pred)
print(targ)
Output example:
Loading notebooks/dataset/dataset.csv...
...success
raw data
Age Height Weight Gender
ID
1 25 5.5 120 Male
2 32 5.7 150 Male
3 21 5.2 110 Female
4 29 5.8 145 Male
5 34 5.5 155 Female
.. ... ... ... ...
132 23 5.0 100 Male
133 26 5.6 135 Female
134 33 6.1 180 Male
135 28 5.7 145 Female
136 20 5.2 110 Male
[136 rows x 4 columns]
clean data
Age Height Weight Gender
ID
1 25 5.5 120 Male
2 32 5.7 150 Male
3 21 5.2 110 Female
4 29 5.8 145 Male
5 34 5.5 155 Female
.. ... ... ... ...
132 23 5.0 100 Male
133 26 5.6 135 Female
134 33 6.1 180 Male
135 28 5.7 145 Female
136 20 5.2 110 Male
[136 rows x 4 columns]
['Weight', 'Height', 'Age']
['Gender']
Set model parameters & custom functions
Code:
import pandas as pd
model.add_validator("steven misin", "steven.misin@aleia.com", "admin") # Mail notification is not yet available
model.raw_set = data_path
model.observations_set = data_path # Should provide a different set of course, but hey, this is a test
model.set_set_sizes(0.1, 0.2) # validation is 10% of raw set, test is 20% of raw set
model.set_variables(pred, targ)
if model.new:
#if it's a proba predict (like SVC), we have to set the propability=true so the predict_proba will be the one wrapped for seldon)
model.model = SVC(kernel='linear',probability=True)
#eval model
from sklearn.metrics import *
def get_metrics(x, y):
accuracy = accuracy_score(x, y)
cm = pd.DataFrame(
columns=model.classes,
index=model.classes,
data=confusion_matrix(x, y, labels=model.classes)
)
return {"accuracy": accuracy, "cm": cm}
model.compute_metrics_function = get_metrics
#feature engineering
def feature_engineering(dfRaw):
#dfRaw['Gender'].replace(['Female', 'Male'], [0, 1], inplace = True)
derived = dfRaw
return derived
model.feature_engineering_function = feature_engineering
Execute model training pipeline as a dry-run
Code:
model.health_check_set = df[:5]
model.health_check()
Output example:
Loading notebooks/dataset/dataset.csv...
...success
True
Execute model training pipeline
Code:
model.learn()
Output example:
Loading notebooks/dataset/dataset.csv...
...success
({'accuracy': 0.7142857142857143,
'cm': Female Male
Female 5 2
Male 2 5},
{'accuracy': 0.5714285714285714,
'cm': Female Male
Female 8 4
Male 8 8})
Get model training pipeline history
Code:
model.learnings_summary
Output example:
| timestamp | revision | dataset_revision | duration | args | kwargs | results |
|---|---|---|---|---|---|---|
2023-09-13 09:53:47.110302 |
1 |
{'raw_set': ('notebooks/dataset/dataset.csv', 4)} |
0.31 |
() |
{} |
({'accuracy': 0.7142857142857143, 'cm': ['Fema… |
Save the model as new version & close current model instance
Code:
model.save()
model.close()
A new Aleia model will be available in the related "Model" section on the application.
Access to various model metadatas
Code:
#load last model
model = AleiaModel("model_titanic_svc_proba")
#load specific model version
#model = AleiaModel("model_titanic_svc_proba", revision=1) # or version=s3_version_uuid
#get model infos
print('Model usage history:')
print(model.open_by_history)
print('Model training summary:')
print(model.learnings_summary)
print('Model documentation:')
print(model.description)
print('Model data source:')
print(model.raw_set)
print('Model classes:')
print(model.classes)
model.close()
Output example:
Loading model model_titanic_svc_proba with version ab085f3e-bdcb-4cd1-9e26-a9cd974354b6 created on 2023-09-13T09:54:42.089696.
Loading notebooks/model/model_titanic_svc_proba/model.aleiamodel...
...success
Model usage history:
{'2023-09-13 09:52:35.048120': {'open_by': 'track 56769f41-ff16-4564-ab1e-2ff4fd2779aa in project b91620c6-23e0-4797-867e-d1ed1ea76873 by user f842e341-1564-44d3-9deb-8038fc3dacf9 in file 1749091114.py', 'closed_at': '2023-09-13 09:54:43.451763'}, '2023-09-13 09:54:57.351331': {'open_by': 'track 56769f41-ff16-4564-ab1e-2ff4fd2779aa in project b91620c6-23e0-4797-867e-d1ed1ea76873 by user f842e341-1564-44d3-9deb-8038fc3dacf9 in file 2389158426.py'}}
Model training summary:
revision \
timestamp
2023-09-13 09:53:47.110302 1
dataset_revision \
timestamp
2023-09-13 09:53:47.110302 {'raw_set': ('notebooks/dataset/dataset.csv', 4)}
duration args kwargs \
timestamp
2023-09-13 09:53:47.110302 0.31 () {}
results
timestamp
2023-09-13 09:53:47.110302 ({'accuracy': 0.7142857142857143, 'cm': ['Fema...
Model documentation:
{'Summary': 'MISSING -- Details about functional context, model objectives, and various stakeholders.', 'Status': 'MISSING -- Details about model lifecycle status. Is it still in experiment or is it live ?', 'Data': 'MISSING -- Details about data used by the model, through exploration to training. Also give details about annotation methodology that was potentially used to create the training dataset.', 'Ethic': 'MISSING -- Details about ethic studies done around model data, if existing biases were addressed or not (through synthetic data for example), and the process used (through ethic comity for example).', 'Training': 'MISSING -- Details about training frequency, data scope, and targeted lifecycle (hot or cold).', 'Explainability': 'MISSING -- Details about model explicability and potential libraries that are used to addressed this topic.', 'Tests': 'MISSING -- Details about test scenarios around the model, parameters control, data preparation, and expected values.', 'Functional validation': 'MISSING -- Details about the model functionnal validation and its methodology. Did it involve functional stakeholders doing annotation & validation campaigns and how was it done ?', 'Activations': 'MISSING -- Details about rules to control the model activation & deactivation on live environment.', 'Deployment checklist': 'MISSING -- List of requirements to check before being officially able to trigger a new model deployment.', 'Architecture': {'Model': 'SVC', 'Requirements': ['re==2.2.1', 'json==2.0.9', 'platform==1.0.8', '_ctypes==1.1.0', 'ctypes==1.1.0', 'zmq.sugar.version==25.1.1', 'zmq.sugar==25.1.1', 'zmq==25.1.1', 'logging==0.5.1.2', 'traitlets._version==5.9.0', 'traitlets==5.9.0', 'zlib==1.0', "_curses==b'2.2'", 'socketserver==0.4', 'argparse==1.1', 'dateutil==2.8.1', 'six==1.15.0', '_decimal==1.70', 'decimal==1.70', 'platformdirs.version==3.10.0', 'platformdirs==3.10.0', '_csv==1.0', 'csv==1.0', 'executing.version==1.2.0', 'executing==1.2.0', 'pure_eval.version==0.2.2', 'pure_eval==0.2.2', 'stack_data.version==0.6.2', 'stack_data==0.6.2', 'pygments==2.16.1', 'ptyprocess==0.7.0', 'pexpect==4.8.0', 'pickleshare==0.7.5', 'backcall==0.2.0', 'decorator==5.1.1', 'wcwidth==0.2.5', 'prompt_toolkit==3.0.39', 'parso==0.8.3', 'jedi==0.19.0', 'urllib.request==3.9', 'comm==0.1.4', 'psutil==5.9.5', 'xmlrpc.client==3.9', 'http.server==0.6', 'pkg_resources._vendor.more_itertools==9.1.0', 'pkg_resources.extern.more_itertools==9.1.0', 'pkg_resources._vendor.platformdirs.version==2.6.2', 'pkg_resources._vendor.platformdirs==2.6.2', 'pkg_resources.extern.platformdirs==2.6.2', 'pkg_resources._vendor.packaging==23.1', 'pkg_resources.extern.packaging==23.1', '_pydevd_frame_eval.vendored.bytecode==0.13.0.dev', '_pydev_bundle.fsnotify==0.1.5', 'pydevd==2.9.5', 'packaging==23.1', 'numpy.version==1.24.4', 'numpy.core._multiarray_umath==3.1', 'numpy.core==1.24.4', 'numpy.linalg._umath_linalg==0.1.5', 'numpy.lib==1.24.4', 'numpy==1.24.4', 'scipy==1.11.2', 'scipy.sparse.linalg._isolve._iterative==1.21.6', 'scipy._lib.decorator==4.0.5', 'scipy.linalg._fblas==1.21.6', 'scipy.linalg._flapack==1.21.6', 'scipy.linalg._flinalg==1.21.6', 'scipy.sparse.linalg._eigen.arpack._arpack==1.21.6', 'setuptools._distutils==3.9.18', 'setuptools.version==68.0.0', 'setuptools._vendor.packaging==23.1', 'setuptools.extern.packaging==23.1', 'setuptools._vendor.ordered_set==3.1', 'setuptools.extern.ordered_set==3.1', 'setuptools._vendor.more_itertools==8.8.0', 'setuptools.extern.more_itertools==8.8.0', 'setuptools==68.0.0', 'distutils==3.9.18', 'joblib.externals.cloudpickle==2.0.0', 'joblib.externals.loky==3.0.0', 'joblib==1.1.0', 'sklearn.utils._joblib==1.1.0', 'scipy.special._specfun==1.21.6', 'scipy.optimize._minpack2==1.21.6', 'scipy.optimize._lbfgsb==1.21.6', 'scipy.optimize._cobyla==1.21.6', 'scipy.optimize._slsqp==1.21.6', 'scipy.optimize.__nnls==1.21.6', 'scipy.linalg._interpolative==1.21.6', 'scipy.integrate._vode==1.21.6', 'scipy.integrate._dop==1.21.6', 'scipy.integrate._lsoda==1.21.6', 'scipy.interpolate.dfitpack==1.21.6', 'scipy._lib._uarray==0.8.8.dev0+aa94c5a4.scipy', 'scipy.stats._statlib==1.21.6', 'scipy.stats._mvn==1.21.6', 'threadpoolctl==3.2.0', 'sklearn.base==1.3.0', 'sklearn.utils._show_versions==1.3.0', 'sklearn==1.3.0', 'PIL._version==10.0.0', 'PIL==10.0.0', 'defusedxml==0.7.1', 'cffi==1.15.1', 'PIL.Image==10.0.0', 'pyparsing==3.1.1', 'cycler==0.10.0', 'kiwisolver._cext==1.4.4', 'kiwisolver==1.4.4', 'matplotlib==3.4.3', 'cloudpickle==2.2.1', 'dill.__info__==0.3.6', 'dill==0.3.6', 'botocore==1.20.106', 'botocore.vendored.six==1.10.0', 'botocore.vendored.requests.packages.urllib3==1.10.4', 'urllib3.packages.six==1.12.0', 'urllib3._version==1.25.11', 'urllib3==1.25.11', 'cgi==2.6', 'cryptography.__about__==41.0.3', 'cryptography==41.0.3', '_cffi_backend==1.15.1', 'ipaddress==1.0', 'OpenSSL.version==23.2.0', 'OpenSSL==23.2.0', 'certifi==2020.06.20', 'jmespath==0.10.0', 'botocore.docs.bcdoc==0.16.0', 'botocore.session==1.20.106', 'boto3==1.17.106', 'pytz==2021.1', 'pyarrow._generated_version==12.0.1', 'pyarrow==12.0.1', 'pandas==1.3.0', 'chardet.version==3.0.4', 'chardet==3.0.4', 'charset_normalizer.version==3.2.0', 'charset_normalizer==3.2.0', 'requests.packages.urllib3.packages.six==1.12.0', 'requests.packages.urllib3._version==1.25.11', 'requests.packages.urllib3==1.25.11', 'idna.package_data==2.10', 'idna.idnadata==13.0.0', 'idna==2.10', 'requests.packages.idna.package_data==2.10', 'requests.packages.idna.idnadata==13.0.0', 'requests.packages.idna==2.10', 'requests.packages.chardet==3.0.4', 'requests.__version__==2.28.2', 'requests.utils==2.28.2', 'requests==2.28.2', 'werkzeug==2.0.3', 'click==8.0.4', 'asgiref==3.5.0', 'yaml==6.0.1', 'uvicorn==0.17.5', 'a2wsgi==1.4.0', 'greenlet==1.1.2', 'sqlalchemy==1.4.49', 'pymongo.pool==4.2.0', 'pymongo==4.2.0', 's3transfer==0.4.2']}, 'Technical performances': Duration (s)
Feature engineering 0.0+/-nan
Train-validation-test split 0.0+/-nan
Training 0.07+/-nan
Validation 0.0+/-nan
Test 0.0+/-nan
Prediction nan+/-nan
Postprocess nan+/-nan, 'Evaluation': {'Validation metrics': {'accuracy': 0.7142857142857143, 'cm': Female Male
Female 5 2
Male 2 5}, 'Test metrics': {'accuracy': 0.5714285714285714, 'cm': Female Male
Female 8 4
Male 8 8}}}
Model data source:
Loading notebooks/dataset/dataset.csv...
...success
Age Height Weight Gender
ID
1 25 5.5 120 Male
2 32 5.7 150 Male
3 21 5.2 110 Female
4 29 5.8 145 Male
5 34 5.5 155 Female
.. ... ... ... ...
132 23 5.0 100 Male
133 26 5.6 135 Female
134 33 6.1 180 Male
135 28 5.7 145 Female
136 20 5.2 110 Male
[136 rows x 4 columns]
Model classes:
('Female', 'Male')
Deleted file notebooks/model/model_titanic_svc_proba/model.lock.
Call model predict function with full pipeline
Code:
#predict an existing model with full default pipeline
dfPredict = aleialib.s3.load_file("dataset/dataset.csv",handle_type=True)
#model = AleiaModel("model_titanic_svc_proba", read_only=True, revision=1)
model = AleiaModel("model_titanic_svc_proba", read_only=True)
model.observations_set = dfPredict
results = model.apply()
print(results)
model.close()
Output example:
Loading notebooks/dataset/dataset.csv...
...success
Loading model model_titanic_svc_proba with version ab085f3e-bdcb-4cd1-9e26-a9cd974354b6 created on 2023-09-13T09:54:42.089696.
Loading notebooks/model/model_titanic_svc_proba/model.aleiamodel...
...success
['Male' 'Male' 'Male' 'Male' 'Female' 'Male' 'Female' 'Male' 'Male' 'Male'
'Female' 'Male' 'Male' 'Female' 'Male' 'Male' 'Female' 'Male' 'Female'
'Female' 'Male' 'Male' 'Male' 'Male' 'Female' 'Male' 'Female' 'Male'
'Male' 'Male' 'Female' 'Male' 'Male' 'Female' 'Male' 'Male' 'Female'
'Male' 'Female' 'Female' 'Male' 'Male' 'Male' 'Male' 'Female' 'Male'
'Female' 'Male' 'Male' 'Male' 'Female' 'Male' 'Male' 'Female' 'Male'
'Male' 'Female' 'Male' 'Female' 'Female' 'Male' 'Male' 'Male' 'Male'
'Female' 'Male' 'Female' 'Male' 'Male' 'Male' 'Female' 'Male' 'Male'
'Female' 'Male' 'Male' 'Female' 'Male' 'Female' 'Female' 'Male' 'Male'
'Male' 'Male' 'Female' 'Male' 'Female' 'Male' 'Male' 'Male' 'Female'
'Male' 'Male' 'Female' 'Male' 'Male' 'Female' 'Male' 'Female' 'Female'
'Male' 'Male' 'Male' 'Male' 'Female' 'Male' 'Female' 'Male' 'Male' 'Male'
'Female' 'Male' 'Male' 'Female' 'Male' 'Male' 'Female' 'Male' 'Female'
'Female' 'Male' 'Male' 'Male' 'Male' 'Female' 'Male' 'Female' 'Male'
'Male' 'Male' 'Female' 'Male' 'Male' 'Female' 'Male' 'Male']
Deleted file notebooks/model/model_titanic_svc_proba/model.lock.