There are three interfaces for interacting with DeepSparse:
-
Engine is the lowest-level API that enables you to compile a model and run inference on raw input tensors.
-
Pipeline is the default DeepSparse API. Similar to Hugging Face Pipelines, it wraps Engine with task-specific pre-processing and post-processing steps, allowing you to make requests on raw data and receive post-processed predictions.
-
Server is a REST API wrapper around Pipelines built on FastAPI and Uvicorn. It enables you to start a model serving endpoint running DeepSparse with a single CLI.
This directory offers examples using each API in various supported tasks.
DeepSparse supports the following tasks out of the box:
Pipeline Example | Sentiment Analysis
Here's an example of how a task is used to create a Pipeline:
from deepsparse import Pipeline
pipeline = Pipeline.create(
task="sentiment_analysis",
model_path="zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none")
print(pipeline("I love DeepSparse Pipelines!"))
# labels=['positive'] scores=[0.998009443283081]
Server Example | Sentiment Analysis
Here's an example of how a task is used to create a Server:
deepsparse.server \
--task sentiment_analysis \
--model_path zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none
Making a request:
import requests
# Uvicorn is running on this port
url = 'http://0.0.0.0:5543/v2/models/sentiment_analysis/infer'
# send the data
obj = {"sequences": "Sending requests to DeepSparse Server is fast and easy!"}
resp = requests.post(url=url, json=obj)
# recieve the post-processed output
print(resp.text)
# >> {"labels":["positive"],"scores":[0.9330279231071472]}