API 文档

了解如何使用 Inference Space API 构建 AI 应用。

快速开始

Base URL

https://api.inference.space/v1

OpenAI SDK compatible — just change the base URL and API key.

curl

curl https://api.inference.space/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

认证

所有 API 请求需要在 Authorization header 中提供你的 API Key。

Authorization: Bearer sk-your-api-key

Security Note

Never expose your API key in client-side code. Use environment variables and server-side requests.

模型列表

获取所有可用模型的列表。

GET /v1/models

curl https://api.inference.space/v1/models \
  -H "Authorization: Bearer sk-your-api-key"

Response

{
  "object": "list",
  "data": [
    {
      "id": "deepseek-chat",
      "object": "model",
      "owned_by": "deepseek"
    },
    {
      "id": "gpt-4o",
      "object": "model",
      "owned_by": "openai"
    }
  ]
}

聊天补全

发送消息并获取 AI 回复，兼容 OpenAI API 格式。

POST /v1/chat/completions

{
  "model": "deepseek-chat",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false
}

Response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

流式输出

设置 stream: true 以获取流式响应。

Streaming Request

{
  "model": "deepseek-chat",
  "messages": [{"role": "user", "content": "Tell me a joke"}],
  "stream": true
}

Stream Events

data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant"},"index":0}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Why"},"index":0}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":" did"},"index":0}]}

data: [DONE]

错误码

HTTP Status	Error	Description
401	Unauthorized	Invalid or missing API key
402	Payment Required	Insufficient credits for paid models
429	Too Many Requests	Rate limit exceeded (RPM or RPD)
400	Bad Request	Invalid request body or parameters
500	Internal Error	Upstream provider error or server issue

速率限制

每个 API Key 默认限制：20 RPM / 50 RPD。响应 Header 中包含速率限制信息。

Header	Description
X-RateLimit-Remaining-RPM	Remaining requests per minute
X-RateLimit-Remaining-RPD	Remaining requests per day
Retry-After	Seconds to wait before retrying (on 429)

代码示例

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.inference.space/v1"
)

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

JavaScript (Node.js)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-api-key',
  baseURL: 'https://api.inference.space/v1',
});

const response = await client.chat.completions.create({
  model: 'deepseek-chat',
  messages: [{ role: 'user', content: 'Hello!' }],
});

console.log(response.choices[0].message.content);

Python (Streaming)

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.inference.space/v1"
)

stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

curl

curl https://api.inference.space/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'