Kalliope API Documentation

Introduction

Kalliope is a deployment of the LLaMA-2 Large Language Model that exposes a subset of the OpenAI API version 1 (Kalliope was also the Greek muse of eloquence and rhetoric),

There are only two requests that can be made so far, one to get the list of models available, and the other to request chat completions. The use of the API is restricted; to get the API key necessary to use the API please fill out the API key request form. Once the form is submitted, we will review the application and send you back the API key.

To ask questions and report bugs please use the same email address that sent you the API key (for bugs remember to quote the X-Request-Id header that is returned with every response).

Request the list of available models

A GET request that returns a JSON array containing the names of available models. A request using cURL may look like:

curl -XGET 'https://api.kalliope.bigtwitter.cloud.edu.au/v1/models'\
--header "Authorization: ${API_KEY}"

The response may look like:

["llama-2-70b-chat.q5_k_m"]

Request a chat completion

A chat completion allows the user to have a dialogue with an LLM, for instance by requesting an answer or the generation of some text. The request is structured as a series of messages to the LLM, possibly using different roles: user (you), assistant (the LLM), system (instructions to the assistant guiding the model’s behaviour).

The POST request, in addition to the API Key, must include the name of the model and the maximum number of tokens to return (every token is about 0.75 of a word, and the higher the number of tokens, the slower the response).

A request may look like:

curl -XPOST 'https://api.kalliope.bigtwitter.cloud.edu.au/v1/chat/completions'\
--header "Authorization: ${API_KEY}"\
--header 'Content-Type:application/json'\
--data '{

"model": "llama-2-70b-chat.q5_k_m",
"max_tokens": 10,
"messages": [
{"role": "user", "content": "Who are you?"}
]
}'

The response may look like this:

{
"id": "chatcmpl-ab997d28-dd78-40ba-bb8a-4dfacf15d832",
"object": "chat.completion",
"created": 1696464344,
"model": "llama-2-70b-chat.q5_k_m",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": " Hello! My name is LLaMA, I'm a large language model trained by a team of researcher at Meta AI. My primary function is to understand and respond to human input in a helpful and engaging manner. I can answer questions, provide information, tell stories, and even generate poetry and songs. Is there anything specific you would like to know or talk about?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 78,
"total_tokens": 106
}
}

Kalliope does not keep track of conversations, hence every request si unrelated form all the others.

To ensure the smooth functioning of the system, requests from the same user are capped (say, 30 per minute) and exceeding this cap would incur an HTTP status of 429. Depending on the number of requests, the system may be relatively slow, possibly incurring timeouts (HTTP status code of 504).