API 文档
了解如何使用 Inference Space API 构建 AI 应用。
快速开始
Base URL
https://api.inference.space/v1OpenAI SDK compatible — just change the base URL and API key.
curl
curl https://api.inference.space/v1/chat/completions \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-chat",
"messages": [{"role": "user", "content": "Hello!"}]
}'认证
所有 API 请求需要在 Authorization header 中提供你的 API Key。
Authorization: Bearer sk-your-api-keySecurity Note
Never expose your API key in client-side code. Use environment variables and server-side requests.
模型列表
获取所有可用模型的列表。
GET /v1/models
curl https://api.inference.space/v1/models \
-H "Authorization: Bearer sk-your-api-key"Response
{
"object": "list",
"data": [
{
"id": "deepseek-chat",
"object": "model",
"owned_by": "deepseek"
},
{
"id": "gpt-4o",
"object": "model",
"owned_by": "openai"
}
]
}聊天补全
发送消息并获取 AI 回复,兼容 OpenAI API 格式。
POST /v1/chat/completions
{
"model": "deepseek-chat",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"temperature": 0.7,
"max_tokens": 1024,
"stream": false
}Response
{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8,
"total_tokens": 33
}
}流式输出
设置 stream: true 以获取流式响应。
Streaming Request
{
"model": "deepseek-chat",
"messages": [{"role": "user", "content": "Tell me a joke"}],
"stream": true
}Stream Events
data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant"},"index":0}]}
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Why"},"index":0}]}
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":" did"},"index":0}]}
data: [DONE]错误码
| HTTP Status | Error | Description |
|---|---|---|
| 401 | Unauthorized | Invalid or missing API key |
| 402 | Payment Required | Insufficient credits for paid models |
| 429 | Too Many Requests | Rate limit exceeded (RPM or RPD) |
| 400 | Bad Request | Invalid request body or parameters |
| 500 | Internal Error | Upstream provider error or server issue |
速率限制
每个 API Key 默认限制:20 RPM / 50 RPD。响应 Header 中包含速率限制信息。
| Header | Description |
|---|---|
| X-RateLimit-Remaining-RPM | Remaining requests per minute |
| X-RateLimit-Remaining-RPD | Remaining requests per day |
| Retry-After | Seconds to wait before retrying (on 429) |
代码示例
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
api_key="sk-your-api-key",
base_url="https://api.inference.space/v1"
)
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)JavaScript (Node.js)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'sk-your-api-key',
baseURL: 'https://api.inference.space/v1',
});
const response = await client.chat.completions.create({
model: 'deepseek-chat',
messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.choices[0].message.content);Python (Streaming)
from openai import OpenAI
client = OpenAI(
api_key="sk-your-api-key",
base_url="https://api.inference.space/v1"
)
stream = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)curl
curl https://api.inference.space/v1/chat/completions \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"temperature": 0.7,
"max_tokens": 512
}'