Allonia Python utilities

Allonia Python Library is a library specially designed to work within the Allonia platform (from notebook, module, user-service). It provides several key features:

  • S3 access: load and save data from your track’s storage bucket

  • File encoding I/O management

  • Text annotation: use our partner’s NLP algorithm to annotate text

  • Service access: make executed code call and get answers from your user-services

  • Access our partner Nam.R’s API

  • Module: useful functions to parameterize modules

Overall helper

You can access anytime to detailed information’s about existing functions from the aleialib through the default python helper:

help(aleialib)
help(aleialib.db)
help(aleialib.s3)

Result example with help(aleialib):

Help on package aleialib:

NAME
    aleialib

PACKAGE CONTENTS
    db (package)
    enums
    helpers
    namr
    nlp
    processor
    s3 (package)
    user_services (package)

DATA
    __all__ = ['s3', 'nlp', 'helpers', 'processor', 'user_services', 'db',...

FILE
    /home/aleia-user/aleialib/__init__.py

More information are available directly in the librairie documentation that you can access through JupyterLab contextual help:

  • Menu "Help", selection "Show contextual help" or keyboard shortcut "Ctrl + I"

  • Select in your code the function you want more information about

S3 access

Use this code to load csv-test.csv file, and then save its two first lines as csv-test_output.csv your track’s storage bucket.

import aleialib

#With file located at : notebooks/dataset/dataset.csv
#With handle_type=True, your file will be converted as a dataframe
df_input = aleialib.s3.load_file("dataset.csv", object_type="dataset", handle_type=True)

#Without setting "object_type" parameter, the path will start from /notebooks/ folder
#With file located at : notebooks/dataset/dataset.csv
df_input = aleialib.s3.load_file("dataset/dataset.csv", handle_type=True)

df_output = df_input[:2]

#With handle_type=True, your file will be converted following file extension, for our exemple in csv format
#Your file will be saved at : notebooks/dataset/csv-test_output.csv
aleialib.s3.save_file("csv-test_output.csv", df_output, object_type="dataset", handle_type=True)

Manage files with specific encoding

Files with specific encoding need to be managed through aleialib to be able to use it properly.

Load a file with specific encoding:

import aleialib

#With file located at : notebooks/dataset/test-accents-3-iso-8859-1.csv
df = aleialib.s3.load_file("test-accents-3-iso-8859-1.csv", "iso-8859-1", object_type="dataset")

#Your file will be saved at : notebooks/dataset/new_test-accents-3-iso-8859-1
aleialib.s3.save_file("new_test-accents-3-iso-8859-1.csv", df, encoding="iso-8859-1", object_type="dataset")

Manage files with specific format

Some libraries & specific file format will need to be written on local FS during execution, & therefore will need to be saved on S3 to be versioned & made available for the track

Load an xlsx file from S3:

import aleialib
import pandas as pd
in_file = "aleia_ressources_cost.xlsx"
s3_path = "dataset/"
data = aleialib.s3.load_file(s3_path+in_file)

# Load spreadsheet
xlsx_content = data

# Convert xlsx to dataframe
xl = pd.ExcelFile(xlsx_content)

df = xl.parse('Licences')

print(df)

Output example:

Loading notebooks/dataset/aleia_ressources_cost.xlsx...
...success
         Name  Flat price per month  Ressources margin  \
0      Single                     0                1.5
1        Team                  2500                1.3
2  Enterprise                  4500                1.1

                                             Content
0                Users : 1\nSupport : Online support
1     Users : 5\nSupport : Global support (callback)
2  Users : Unlimited\nSupport : Customer success ...

Create an xlsx file on local & save it on S3:

outputName = 'output.xlsx'
df.to_excel(outputName)

with open(outputName, 'rb') as file:
    df_binary = file.read()

aleialib.s3.save_file(outputName,df_binary,object_type='dataset')

Output example:

Created dataset object at notebooks/dataset/output.xlsx.
True

Text annotation

Use this code to annotate a simple sample text:

from aleialib import nlp

text = "Sample text to be processed..."
annotated_response = nlp.annotate_text(text, "en", "json")
print(annotated_response)

DB connectors

You will be able to use db connections that have been created on the platform by using specific functions of the aleialib.db module.

You can check how you can create db connections & available db connectors here.

Example for an existing PostgreSQL database connection:

# DB Connectors
import aleialib

aleiadbc = aleialib.db.connect("myPostgresqlConnection")

for row in aleiadbc.request("SELECT * FROM dummy_table;"):
    print(row)

Example for an existing MongoDB database connection:

import aleialib

aleiadbc = aleialib.db.connect("myMongoDBConnection")

# users collection in the database
users = aleiadbc.users

for user in users.find():
    print(user)

Access Nam.R data

import aleialib

print(dir(aleialib.namr))

metadata_thematics = aleialib.namr.get_metadata_thematics()
print("\n\nMetadata thematics :\n\n")
print(metadata_thematics)

metadata_user = aleialib.namr.get_metadata_user()
print("\n\nMetadata user :\n\n")
print(metadata_user)

building = aleialib.namr.get_building(id=21302567, fields=["building.roof_type"])
print("\n\nBuilding data :\n\n")
print(building)

geoscope_building = aleialib.namr.list_geoscope_building(
    MUNICIPALITY=["80021"],
    fields=["building.altitude", "building.roof_type"],
    limit=20,
    offset=0
)
print("\n\nBuilding geoscope :\n\n")
print(geoscope_building)

TreeRank algorithm

TreeRank is a high-performance binary classification and scoring algorithm.

It is based on supervised learning: machine learning is applied on a labeled input data set, generating a model.

The model is then used to perform prediction. The classification-part of the algorithm is based on Random Forest.

TreeRank allows optimization of the the ROC curve and AUC.

A sample notebook is provided to showcase the creation of a TreeRank ML model and extraction of various results and metrics.

Read and display an image

Upload image.png file on Allonia Platform using SFTP, then read and display image using the following Python code in a notebook:

import io
from IPython.display import display
import aleialib.s3
from PIL import Image

s3client, bucket = aleialib.s3.get_client()
response = s3client.get_object(Bucket=bucket, Key="notebooks/dataset/image.png")
image_object = response["Body"].read()
image = Image.open(io.BytesIO(image_object))

display(image)