Skip to content

Latest commit

 

History

History
552 lines (448 loc) · 18.8 KB

File metadata and controls

552 lines (448 loc) · 18.8 KB
title excerpt updated
Discover the OVH Prescience APIs
Learn how to manage OVH Prescience APIs
2018-09-26

Objective

Prescience is an automatic learning tool that can be managed through several APIs to automate a wide range of actions.

This guide is a detailed introduction to those APIs and will show you how to manage your own OVH Prescience platform.

API URL Description
Prescience API https://prescience-api.ai.ovh.net API that allows to manipulate Prescience’s “sources”, “datasets” and “models”.
Prescience Serving https://prescience-serving.ai.ovh.net API allows to assess a model that was generated by Prescience.

Authentication

Using Prescience requires an authentication token.

Here is an example of an API call:

curl -X GET "https://prescience-api.ai.ovh.net/project" -H "Authorization: Bearer ${TOKEN}"

OVH Prescience API

Sources

The “source” object is the result of a parsing task (analysis). During the API call, the returned object includes the following items:

Record Description Type Orderable Filterable
source_id Source identifier String Yes No
input_url Internal URL of the pre-parsing file String No No
source_url Internal URL of the pre-parsing file String No No
input_type Type of source file String Yes No
headers The pre-parsing file contains the headers Boolean Yes No
separator Separator of the pre-parsing file if CSV String No No
diagram Character string that represents the diagram in JSON String No No
status Source status Status Yes No
last_update Last updated on 26/09/2018 Timestamp Yes No
created_at Creation date Timestamp Yes No
total_step Total number of steps in the parsing process Integer No No
current_step Current step in the parsing process Integer No No
current_step_description Description of the current step in the parsing process String No No

Resource list:

GET https://prescience-api.ai.ovh.net/source

Settings: Type In Required Default Meaning Example
Page Integer Query No 1 Page number 2
Size Integer Query No 100 Number of items per page 50
Sort_column String Query No created_at Field in which results are ordered source_id
Sort_direction String Query No created_at Field in which results are ordered source_id

Source retrieval:

GET https://prescience-api.ai.ovh.net/source/{id_source}

Settings: Type In Required Default Meaning Example
id_source String Path Yes Source identifier ma_source

Source deletion:

DELETE https://prescience-api.ai.ovh.net/source/{id_source}

Settings: Type In Required Default Meaning Example
id_source String Path Yes Source identifier ma_source

Datasets

The “dataset” object is the result of a “preprocessing” task. During the API call, the returned object will contain the following items:

Record Description Type Orderable Filterable
dataset_id Dataset identifier String Yes Yes
source “Source” object that generated the dataset Source No Yes
dataset_url Internal URL of the file resulting from the preprocess String No No
transformation_url Internal URL of the transformation PMML file String No No
label_id Identifier of the “label” column String Yes No
problem_type Type of machine learning problem (“Classification” / “Regression”) String Yes No
nb_fold Number of cutoffs done through the preprocess Boolean Yes No
selected_columns List of columns chosen in the source String[] No No
diagram Character string that represents the diagram in JSON String No No
status Dataset status Status Yes No
last_update Last updated on 26/09/2018 Timestamp Yes No
created_at Creation date Timestamp Yes No
total_step Total number of steps in the preprocess Integer No No
current_step Current step of the preprocess operation Integer No No
current_step_description Description of the current step in the preprocess operation String No No

List of datasets:

GET https://prescience-api.ai.ovh.net/dataset/

Settings: Type In Required Default Meaning Example
Page Integer Query No 1 Page number 2
Size Integer Query No 100 Number of items per page 50
Sort_column String Query No created_at Field in which results are ordered source_id
Sort_direction String Query No created_at Field in which results are ordered source_id
Dataset_id String Query No Filtering field on the dataset name (search in LIKE mode) dataset
Source_id String Query No Filtering field on the dataset source name (search in LIKE mode) source

Dataset retrieval:

GET https://prescience-api.ai.ovh.net/dataset/{id_dataset}

Settings: Type In Required Default Meaning Example
id_dataset String Path Yes Dataset identifier my_dataset

Deleting a dataset:

DELETE https://prescience-api.ai.ovh.net/dataset/{id_dataset}

Settings: Type In Required Default Meaning Example
id_dataset String Path Yes Dataset identifier my_dataset

Models

The “model” object is the result of a “train” task. During the API call, the returned object will contain the following items:

Record Description Type Orderable Filterable
model_id Model identifier String Yes No
dataset “Dataset” object that generated the model Dataset No Yes
label_id Identifier of the “label” column String Yes No
config “Config” object that generated the model Config No No
status Dataset status Status Yes No
last_update Last updated on 26/09/2018 Timestamp Yes No
created_at Creation date Timestamp Yes No
total_step Total number of steps in the “train” process Integer No No
current_step Current step of the “train” process. Integer No No
current_step_description Description of the current step of the “train” process String No No

The “config” object describes the configuration used to generate the machine learning model.

Record Description Type
name Name of the algorithm used String
class_identifier Interne identifier String
kwargs Model hyparameters Dictionary

Model list:

GET https://prescience-api.ai.ovh.net/model

Settings: Type In Required Default Meaning Example
Page Integer Query No 1 Desired page number 2
Size Integer Query No 100 Number of desired items per page 50
Sort_column String Query No created_at Field in which results are ordered model_id
Sort_direction String Query No created_at Field in which results are ordered model_id
Dataset_id String Query No Filtering field on the dataset name (search in LIKE mode) dataset

Model retrieval:

GET https://prescience-api.ai.ovh.net/model/{id_model}

Settings: Type In Required Default Meaning Example
id_model String Path Yes Model identifier my_model

Deleting a model:

DELETE https://prescience-api.ai.ovh.net/model/{id_model}

Settings: Type In Required Default Meaning Example
id_model String Path Yes Model identifier my_model

Parsing

To create a “source”, you need to launch a parsing task.

POST https://prescience-api.ai.ovh.net/ml/upload/source

Settings: Type In Required Default Meaning Example
parse.source_id String Multipart parse JSON Yes Source name my-source
parse.input_type String Multipart parse JSON Yes CSV or Parquet file format only CSV
parse.separator String Multipart parse JSON No , Separator in the case of a CSV file ;
files Files Multipart input-file-file-index name No File to upload (may contain several) input-file-0

For example:

Assuming that the “data-1.csv” and “data-2.csv” CSV files are in the same directory:

parse.json file

{
    "source_id": "my-source",
    "input_type": "csv",
    "separator": ","
}
curl -H "Authorization: Bearer ${TOKEN}" -v \
    -F parse='@parse.json;type=application/json' \
    -F [email protected] \
    -F [email protected] \
    https://prescience-api.ai.ovh.net/ml/upload/source

Warning

The source that was sent back in the response is incomplete. Since the task is asynchronous, it will be completed as it progresses.

Preprocess

To create a "dataset", you must first have generated a "source", and then have created a preprocess task.

POST https://prescience-api.ai.ovh.net/ml/preprocess/{source_id}

Settings: Type In Required Default Meaning Example
source_id String Query Yes Name of the source to be parsed my-source
dataset_id String Body JSON Yes Name of the future dataset my-big-dataset
label_id String Body JSON Yes Identifier of the column of the dataset to be labelled my-label
nb_fold String Body JSON No 10 Number of folds to create during parsing 6
problem_type String Body JSON Yes Type of machine learning problem (classification/ Regression) regression
selected_columns String[] Body JSON No [] Selecting columns for the dataset. By default, all columns are selected ["colonne_1", "colonne_2"]

For example:

preprocess.json file

{
    "dataset_id": "my-dataset",
    "label_id": "my-label",
    "problem_type": "classification"
}
curl -H "Authorization: Bearer ${TOKEN}" \
     -H "Content-Type:application/json" \
     -X POST https://prescience-api.ai.ovh.net/ml/preprocess/ma-source \
     --data-binary "@preprocess.json"

Warning

The dataset that was sent back in the response is incomplete. Since the task is asynchronous, it will be completed as it progresses.

Optimisation

Once the dataset has been created, it is possible to start optimising it.

POST https://prescience-api.ai.ovh.net/ml/optimize/{dataset_id}

Settings: Type In Required Default Meaning Example
dataset_id String Query Yes Name of dataset to be optimised my-big-dataset
scoring_metric String Body JSON Yes Optimisation metric (Regression: mae/mse / R2, Classification : accuracy, f1, roc_auc) my-source
budget Integer Body JSON 6 Budget allocated to optimisation 10

For example:

optimize.json file

{
    "scoring_metric": "roc_auc",
    "budget": 6
}
curl -H "Authorization: Bearer ${TOKEN}" \
     -H "Content-Type:application/json" \
     -X POST https://prescience-api.ai.ovh.net/ml/optimize/my-big-dataset \
     --data-binary "@optiumize.json"

Warning

The optimisation task returns an object called "Optimization". Once the optimisation is complete, it will be possible to run a query on the "Evaluation-Result" objects to obtain the best possible configuration.

Evaluation Result

The "Evaluation-Result" object is the result of an optimisation task. During the API call, the returned object will contain the following items:

Record Description Type
uuid UUID of evaluation Integer
spent_time Time spent evaluating the configuration Integer
costs Dictionary containing the metrics associated with the configuration Dict{}
config Tested configuration Config
status Dataset status Status
last_update Last updated on 26/09/2018 Timestamp
created_at Creation date Timestamp
total_step Total number of steps in the optimisation process Integer
current_step Current step of the optimisation process. Integer
current_step_description Description of the current step of the optimisation process String

Evaluation list:

GET https://prescience-api.ai.ovh.net/evaluation-result

Settings: Type In Required Default Meaning Example
Dataset_id String Query Yes Filtering of evaluations on the dataset my-big-dataset
Page Integer Query No 1 Desired page number 2
Size Integer Query No 100 Number of desired items per page 50
Sort_column String Query No created_at Field in which results are ordered source_id
Sort_direction String Query No created_at Field in which results are ordered source_id
Status String Query No Filtering data based on status BUILT

Training

After choosing the best configuration from the list of "Evaluation-Results" we can train a model:

POST https://jedison.ai.ovh.net/ml/train

Settings: Type In Required Default Meaning Example
model_id String Query Yes Name of the future model my-model
evaluation_uuid String Query Yes Evaluation-Result identifier bcaef619-4bf3-4c15-b49f-bc325f98d891
dataset_id String Query No dataset_id linked to Evaluation-Result To be completed if training with a dataset different than the Evaluation-Result dataset my-alternative-dataset

For example:

curl -H "Authorization: Bearer ${TOKEN}" \
     -H "Content-Type:application/json" \
     -X POST https://prescience-api.ai.ovh.net/ml/train/?model_id=mon-model&evaluation_uuid=bcaef619-4bf3-4c15-b49f-bc325f98d891 \

Warning

The training task returns an incomplete model object. Indeed, since the task is asynchronous, it will be completed as it progresses.

OVH Prescience Serving API

Model description:

Once a model is trained, it can be used to make inferences.

Warning

Both APIs have a "model" object. These do not have the same structure. Only the model_id identifier is common to both.

Model description:

POST https://prescience-serving.ai.ovh.net/model/{model_id}

The returned object describes the "model" object according to Prescience Serving.

Example of result:

{
	"id": "model",
	"properties": {
		"created.timestamp": 1537170170985,
		"accessed.timestamp": null,
		"file.size": 3737,
		"file.md5sum": "a13e6e482bb2e62d1376b502f8cbc8a2"
	},
	"schema": {
		"argumentsFields": [{
			"id": "hours-per-week",
			"dataType": "integer",
			"opType": "ordinal"
		}, {
			"id": "capital-gain",
			"dataType": "integer",
			"opType": "ordinal"
		}, {
			"id": "education-num",
			"dataType": "integer",
			"opType": "ordinal"
		}, {
			"id": "age",
			"dataType": "integer",
			"opType": "ordinal"
		}, {
			"id": "fnlwgt",
			"dataType": "integer",
			"opType": "ordinal"
		}, {
			"id": "capital-loss",
			"dataType": "integer",
			"opType": "ordinal"
		}],
		"transformFields": [{
			"id": "imputed_hours-per-week",
			"dataType": "integer",
			"opType": "ordinal"
		}, {
			"id": "imputed_capital-gain",
			"dataType": "integer",
			"opType": "ordinal"
		}, {
			"id": "imputed_education-num",
			"dataType": "integer",
			"opType": "ordinal"
		}, {
			"id": "imputed_age",
			"dataType": "integer",
			"opType": "ordinal"
		}, {
			"id": "imputed_fnlwgt",
			"dataType": "integer",
			"opType": "ordinal"
		}, {
			"id": "imputed_capital-loss",
			"dataType": "integer",
			"opType": "ordinal"
		}, {
			"id": "scaled_imputed_hours-per-week",
			"dataType": "double",
			"opType": "continuous"
		}, {
			"id": "scaled_imputed_capital-gain",
			"dataType": "double",
			"opType": "continuous"
		}, {
			"id": "scaled_imputed_education-num",
			"dataType": "double",
			"opType": "continuous"
		}, {
			"id": "scaled_imputed_age",
			"dataType": "double",
			"opType": "continuous"
		}, {
			"id": "scaled_imputed_fnlwgt",
			"dataType": "double",
			"opType": "continuous"
		}, {
			"id": "scaled_imputed_capital-loss",
			"dataType": "double",
			"opType": "continuous"
		}]
	}
}

Model evaluation

Warning

During the preprocessing stage, a data transformation is performed. Since the model is based on the output of this transformation, it is imperative that the data is transformed before using the model. Prescience Serving provides methods of performing both this transformation and the inference.

The Serving platform allows you to perform the following:

  • Transformation and evaluation
  • Evaluation only
  • Transformation only
Method URL Description
POST https://prescience-serving.ai.ovh.net/eval/{model_id}/model Unit inference
POST https://prescience-serving.ai.ovh.net/eval/{model_id}/model/batch/csv Batch inference from a CSV file
POST https://prescience-serving.ai.ovh.net/eval/{model_id}/model/batch/json Batch inference from a JSON table
POST https://prescience-serving.ai.ovh.net/eval/{transform_id}/transform Unit transformation
POST https://prescience-serving.ai.ovh.net/eval/{transform_id}/transform/batch/csv Batch transformation from a CSV file
POST https://prescience-serving.ai.ovh.net/eval/{transform_id}/transform/batch/json Batch transformation from a JSON table
POST https://prescience-serving.ai.ovh.net/eval/{transform_model_id}/transform-model Transformation associated with the model and unit inference
POST https://prescience-serving.ai.ovh.net/eval/{transform_model_id}/transform-model/batch/csv Batch transformation associated with the model and inference from a CSV file
POST https://prescience-serving.ai.ovh.net/eval/{transform_model_id}/transform-model/batch/json Batch transformation associated with the model and inference from a JSON table
Settings: Type In Required Default Meaning
id String JSON No Query identifier
arguments Dict JSON Yes Query arguments

Example of unit inference:

example.json file

{
	"arguments": {
		"hours-per-week": 1,
		"capital-gain": 1,
		"education-num": 1,
		"age": 1,
		"fnlwgt": 1,
		"capital-loss": 1
	}
}

Query

curl -H "Authorization: Bearer ${TOKEN}" \
     -H "Content-Type:application/json" \
     -X POST https://prescience-serving.ai.ovh.net/eval/mon-model/transform-model \
     --data-binary "@example.json"

Example of the evaluation of a JSON batch:

example.json file

[
    {
        "id": "eval-1",
        "arguments": {
            "hours-per-week": 1,
            "capital-gain": 1,
            "education-num": 1,
            "age": 1,
            "fnlwgt": 1,
            "capital-loss": 1
        }
    },
    {
        "id": "eval-2",
        "arguments": {
            "hours-per-week": 1,
            "capital-gain": 1,
            "education-num": 1,
            "age": 1,
            "fnlwgt": 1,
            "capital-loss": 1
        }
    }
]

Query

curl -H "Authorization: Bearer ${TOKEN}" \
     -H "Content-Type:application/json" \
     -X POST https://prescience-serving.ai.ovh.net/eval/mon-model/transform-model/batch/json \
     --data-binary "@example.json"

Go further

Join our community of users on https://community.ovh.com/en/.