Allonia Python utilities
Allonia Python Library is a library specially designed to work within the Allonia platform (from notebook, module, user-service). It provides several key features:
-
S3 access: load and save data from your track’s storage bucket
-
File encoding I/O management
-
Text annotation: use our partner’s NLP algorithm to annotate text
-
Service access: make executed code call and get answers from your user-services
-
Access our partner Nam.R’s API
-
Module: useful functions to parameterize modules
Overall helper
You can access anytime to detailed information’s about existing functions from the aleialib through the default python helper:
help(aleialib)
help(aleialib.db)
help(aleialib.s3)
Result example with help(aleialib):
Help on package aleialib:
NAME
aleialib
PACKAGE CONTENTS
db (package)
enums
helpers
namr
nlp
processor
s3 (package)
user_services (package)
DATA
__all__ = ['s3', 'nlp', 'helpers', 'processor', 'user_services', 'db',...
FILE
/home/aleia-user/aleialib/__init__.py
More information are available directly in the librairie documentation that you can access through JupyterLab contextual help:
-
Menu "Help", selection "Show contextual help" or keyboard shortcut "Ctrl + I"
-
Select in your code the function you want more information about
S3 access
Use this code to load csv-test.csv file, and then save its two first lines as csv-test_output.csv your track’s storage bucket.
import aleialib
#With file located at : notebooks/dataset/dataset.csv
#With handle_type=True, your file will be converted as a dataframe
df_input = aleialib.s3.load_file("dataset.csv", object_type="dataset", handle_type=True)
#Without setting "object_type" parameter, the path will start from /notebooks/ folder
#With file located at : notebooks/dataset/dataset.csv
df_input = aleialib.s3.load_file("dataset/dataset.csv", handle_type=True)
df_output = df_input[:2]
#With handle_type=True, your file will be converted following file extension, for our exemple in csv format
#Your file will be saved at : notebooks/dataset/csv-test_output.csv
aleialib.s3.save_file("csv-test_output.csv", df_output, object_type="dataset", handle_type=True)
Manage files with specific encoding
Files with specific encoding need to be managed through aleialib to be able to use it properly.
Load a file with specific encoding:
import aleialib
#With file located at : notebooks/dataset/test-accents-3-iso-8859-1.csv
df = aleialib.s3.load_file("test-accents-3-iso-8859-1.csv", "iso-8859-1", object_type="dataset")
#Your file will be saved at : notebooks/dataset/new_test-accents-3-iso-8859-1
aleialib.s3.save_file("new_test-accents-3-iso-8859-1.csv", df, encoding="iso-8859-1", object_type="dataset")
Manage files with specific format
Some libraries & specific file format will need to be written on local FS during execution, & therefore will need to be saved on S3 to be versioned & made available for the track
Load an xlsx file from S3:
import aleialib
import pandas as pd
in_file = "aleia_ressources_cost.xlsx"
s3_path = "dataset/"
data = aleialib.s3.load_file(s3_path+in_file)
# Load spreadsheet
xlsx_content = data
# Convert xlsx to dataframe
xl = pd.ExcelFile(xlsx_content)
df = xl.parse('Licences')
print(df)
Output example:
Loading notebooks/dataset/aleia_ressources_cost.xlsx...
...success
Name Flat price per month Ressources margin \
0 Single 0 1.5
1 Team 2500 1.3
2 Enterprise 4500 1.1
Content
0 Users : 1\nSupport : Online support
1 Users : 5\nSupport : Global support (callback)
2 Users : Unlimited\nSupport : Customer success ...
Create an xlsx file on local & save it on S3:
outputName = 'output.xlsx'
df.to_excel(outputName)
with open(outputName, 'rb') as file:
df_binary = file.read()
aleialib.s3.save_file(outputName,df_binary,object_type='dataset')
Output example:
Created dataset object at notebooks/dataset/output.xlsx.
True
Text annotation
Use this code to annotate a simple sample text:
from aleialib import nlp
text = "Sample text to be processed..."
annotated_response = nlp.annotate_text(text, "en", "json")
print(annotated_response)
DB connectors
You will be able to use db connections that have been created on the platform by using specific functions of the aleialib.db module.
|
You can check how you can create db connections & available db connectors here. |
Example for an existing PostgreSQL database connection:
# DB Connectors
import aleialib
aleiadbc = aleialib.db.connect("myPostgresqlConnection")
for row in aleiadbc.request("SELECT * FROM dummy_table;"):
print(row)
Example for an existing MongoDB database connection:
import aleialib
aleiadbc = aleialib.db.connect("myMongoDBConnection")
# users collection in the database
users = aleiadbc.users
for user in users.find():
print(user)
Access Nam.R data
import aleialib
print(dir(aleialib.namr))
metadata_thematics = aleialib.namr.get_metadata_thematics()
print("\n\nMetadata thematics :\n\n")
print(metadata_thematics)
metadata_user = aleialib.namr.get_metadata_user()
print("\n\nMetadata user :\n\n")
print(metadata_user)
building = aleialib.namr.get_building(id=21302567, fields=["building.roof_type"])
print("\n\nBuilding data :\n\n")
print(building)
geoscope_building = aleialib.namr.list_geoscope_building(
MUNICIPALITY=["80021"],
fields=["building.altitude", "building.roof_type"],
limit=20,
offset=0
)
print("\n\nBuilding geoscope :\n\n")
print(geoscope_building)
TreeRank algorithm
TreeRank is a high-performance binary classification and scoring algorithm.
It is based on supervised learning: machine learning is applied on a labeled input data set, generating a model.
The model is then used to perform prediction. The classification-part of the algorithm is based on Random Forest.
TreeRank allows optimization of the the ROC curve and AUC.
A sample notebook is provided to showcase the creation of a TreeRank ML model and extraction of various results and metrics.
Read and display an image
Upload image.png file on Allonia Platform using SFTP, then read and display image using the following Python code in a notebook:
import io
from IPython.display import display
import aleialib.s3
from PIL import Image
s3client, bucket = aleialib.s3.get_client()
response = s3client.get_object(Bucket=bucket, Key="notebooks/dataset/image.png")
image_object = response["Body"].read()
image = Image.open(io.BytesIO(image_object))
display(image)