Inference Space

API 文档

了解如何使用 Inference Space API 构建 AI 应用。

快速开始

Base URL

https://api.inference.space/v1

OpenAI SDK compatible — just change the base URL and API key.

curl
curl https://api.inference.space/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

认证

所有 API 请求需要在 Authorization header 中提供你的 API Key。

Authorization: Bearer sk-your-api-key

Security Note

Never expose your API key in client-side code. Use environment variables and server-side requests.

模型列表

获取所有可用模型的列表。

GET /v1/models
curl https://api.inference.space/v1/models \
  -H "Authorization: Bearer sk-your-api-key"
Response
{
  "object": "list",
  "data": [
    {
      "id": "deepseek-chat",
      "object": "model",
      "owned_by": "deepseek"
    },
    {
      "id": "gpt-4o",
      "object": "model",
      "owned_by": "openai"
    }
  ]
}

聊天补全

发送消息并获取 AI 回复,兼容 OpenAI API 格式。

POST /v1/chat/completions
{
  "model": "deepseek-chat",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false
}
Response
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

流式输出

设置 stream: true 以获取流式响应。

Streaming Request
{
  "model": "deepseek-chat",
  "messages": [{"role": "user", "content": "Tell me a joke"}],
  "stream": true
}
Stream Events
data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant"},"index":0}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Why"},"index":0}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":" did"},"index":0}]}

data: [DONE]

错误码

HTTP StatusErrorDescription
401UnauthorizedInvalid or missing API key
402Payment RequiredInsufficient credits for paid models
429Too Many RequestsRate limit exceeded (RPM or RPD)
400Bad RequestInvalid request body or parameters
500Internal ErrorUpstream provider error or server issue

速率限制

每个 API Key 默认限制:20 RPM / 50 RPD。响应 Header 中包含速率限制信息。

HeaderDescription
X-RateLimit-Remaining-RPMRemaining requests per minute
X-RateLimit-Remaining-RPDRemaining requests per day
Retry-AfterSeconds to wait before retrying (on 429)

代码示例

Python (OpenAI SDK)
from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.inference.space/v1"
)

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)
JavaScript (Node.js)
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-api-key',
  baseURL: 'https://api.inference.space/v1',
});

const response = await client.chat.completions.create({
  model: 'deepseek-chat',
  messages: [{ role: 'user', content: 'Hello!' }],
});

console.log(response.choices[0].message.content);
Python (Streaming)
from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.inference.space/v1"
)

stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)
curl
curl https://api.inference.space/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'