Skip to content

Commit af19d35

Browse files
ggerganovkir-gadjelloTobi Lütke
authored
server : OAI API compatibility (ggml-org#4198)
* Add openai-compatible POST /v1/chat/completions API endpoint to server example * fix code style * Update server README.md * Improve server README.md * Fix server.cpp code style according to review * server : some style changes * server : indentation * server : enable special tokens during tokenization by default * server : minor code style * server : change random string generator * straightforward /v1/models endpoint --------- Co-authored-by: kir-gadjello <[email protected]> Co-authored-by: Tobi Lütke <[email protected]>
1 parent e9c13ff commit af19d35

File tree

2 files changed

+413
-11
lines changed

2 files changed

+413
-11
lines changed

examples/server/README.md

+49
Original file line numberDiff line numberDiff line change
@@ -234,6 +234,55 @@ node index.js
234234

235235
- **GET** `/props`: Return the required assistant name and anti-prompt to generate the prompt in case you have specified a system prompt for all slots.
236236

237+
- **POST** `/v1/chat/completions`: OpenAI-compatible Chat Completions API. Given a ChatML-formatted json description in `messages`, it returns the predicted completion. Both synchronous and streaming mode are supported, so scripted and interactive applications work fine. While no strong claims of compatibility with OpenAI API spec is being made, in our experience it suffices to support many apps. Only ChatML-tuned models, such as Dolphin, OpenOrca, OpenHermes, OpenChat-3.5, etc can be used with this endpoint. Compared to `api_like_OAI.py` this API implementation does not require a wrapper to be served.
238+
239+
*Options:*
240+
241+
See [OpenAI Chat Completions API documentation](https://platform.openai.com/docs/api-reference/chat). While some OpenAI-specific features such as function calling aren't supported, llama.cpp `/completion`-specific features such are `mirostat` are supported.
242+
243+
*Examples:*
244+
245+
You can use either Python `openai` library with appropriate checkpoints:
246+
247+
```python
248+
import openai
249+
250+
client = openai.OpenAI(
251+
base_url="http://localhost:8080/v1", # "http://<Your api-server IP>:port"
252+
api_key = "sk-no-key-required"
253+
)
254+
255+
completion = client.chat.completions.create(
256+
model="gpt-3.5-turbo",
257+
messages=[
258+
{"role": "system", "content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."},
259+
{"role": "user", "content": "Write a limerick about python exceptions"}
260+
]
261+
)
262+
263+
print(completion.choices[0].message)
264+
```
265+
... or raw HTTP requests:
266+
267+
```shell
268+
curl http://localhost:8080/v1/chat/completions \
269+
-H "Content-Type: application/json" \
270+
-H "Authorization: Bearer no-key" \
271+
-d '{
272+
"model": "gpt-3.5-turbo",
273+
"messages": [
274+
{
275+
"role": "system",
276+
"content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."
277+
},
278+
{
279+
"role": "user",
280+
"content": "Write a limerick about python exceptions"
281+
}
282+
]
283+
}'
284+
```
285+
237286
## More examples
238287

239288
### Change system prompt on runtime

0 commit comments

Comments
 (0)