Skip to content

Latest commit

 

History

History
12 lines (8 loc) · 958 Bytes

ols-tokens-and-token-quota-limits.adoc

File metadata and controls

12 lines (8 loc) · 958 Bytes

Tokens and token quota limits

Tokens are small chunks of text, which can be as small as one character or as large as one word. Tokens are the units of measurement used to quantify the amount of text that the {ols-long} Service sends to, or receives from, a large language model (LLM). Every interaction with the Service and the LLM is counted in tokens.

Token quota limits define the number of tokens that can be used in a certain timeframe. Implementing token quota limits helps control costs, encourage more efficient use of queries, and regulate system demands. In a multi-user configuration, token quota limits help provide equal access to all users ensuring everyone has an opportunity to submit queries.

You can define token quota limits for {ocp-short-name} clusters or {ocp-short-name} user accounts.