QYUAN AI API Documentation

Quickstart

If you already use the OpenAI SDK, start here first. This is the lowest-friction migration path.

You can think of QYUAN AI as a unified API gateway. Right now there are two recommended integration patterns:

OpenAI-compatible: best for most SDKs and apps, using /v1/chat/completions or /v1/responses.
Claude Messages: if your existing code already follows the Anthropic format, you can call /v1/messages directly.

OpenAI Base URL https://token.qyuanai.com/v1

Claude endpoint https://token.qyuanai.com/v1/messages

Authentication Use Authorization: Bearer YOUR_API_KEY for OpenAI-compatible endpoints. Use x-api-key: YOUR_API_KEY for Claude Messages.

Create an API key in the console before making requests. Do not guess model names manually. Start with GET /v1/models to fetch the currently available model list.

Interface overview

Models on this platform no longer share a single request shape. The most common integration mistake is this: most text models can use the OpenAI-compatible interface, but image generation and Claude native requests require different payload formats.

Model type	Recommended endpoint	Core payload shape
GPT / Codex / Claude / Gemini text models	`POST /v1/chat/completions`	`model + messages + max_tokens`
Gemini models through this gateway	`POST /v1/chat/completions`	Still use OpenAI-style `messages`. Do not switch to Google-native `contents`.
Pure image generation	`POST /v1/images/generations`	`model + prompt + size + n`
OpenAI Responses-style apps	`POST /v1/responses`	`model + input + max_output_tokens`
Claude native format	`POST /v1/messages`	`model + max_tokens + messages` with `x-api-key`
`GET /v1/models`	List currently available models	Recommended before every new integration

Model categories

As of 2026-05-15, the models returned by the API can be understood like this:

Category	Current models	Recommended integration
OpenAI / Codex text models	`gpt-5.2`、`gpt-5.3-codex`、`gpt-5.3-codex-spark`、`gpt-5.4`、`gpt-5.4-mini`、`gpt-5.5`、`gpt-oss-120b-medium`	`/v1/chat/completions` or `/v1/responses`
Claude models	`claude-sonnet-4-5`、`claude-sonnet-4-6`、`claude-opus-4-6`、`claude-opus-4-6-thinking`、`claude-opus-4-7`	`/v1/chat/completions` or `/v1/messages`
Gemini text / reasoning models	`gemini-2.5-flash`、`gemini-2.5-flash-lite`、`gemini-2.5-pro`、`gemini-3-flash`、`gemini-3-flash-preview`、`gemini-3-pro-low`、`gemini-3-pro-high`、`gemini-3-pro-preview`、`gemini-3.1-flash-lite`、`gemini-3.1-flash-lite-preview`、`gemini-3.1-pro-low`、`gemini-3.1-pro-high`、`gemini-3.1-pro-preview`	`/v1/chat/completions`
Image models	`gpt-image-2`	`/v1/images/generations`
Gemini image-capable models	`gemini-3.1-flash-image`	For now, test and integrate it through `/v1/chat/completions`

The safest pattern is still to request GET /v1/models first and read model names directly from the response.

curl https://token.qyuanai.com/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

OpenAI Chat Completions

This is the most universal integration path right now. GPT, Claude, and Gemini models can all be tested through this format first.

Default payload: model + messages + optional max_tokens / stream。

curl example

curl https://token.qyuanai.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4-mini",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Introduce QYUAN AI in one sentence."}
    ],
    "max_tokens": 512
  }'

Python example

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://token.qyuanai.com/v1"
)

resp = client.chat.completions.create(
    model="gpt-5.4-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Introduce QYUAN AI in one sentence."}
    ],
    max_tokens=512,
)

print(resp.choices[0].message.content)

JavaScript example

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.QYUAN_API_KEY,
  baseURL: "https://token.qyuanai.com/v1",
});

const resp = await client.chat.completions.create({
  model: "gpt-5.4-mini",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Introduce QYUAN AI in one sentence." }
  ],
  max_tokens: 512
});

console.log(resp.choices[0].message.content);

Gemini integration

This model family is already normalized into the OpenAI-compatible format on this platform, so do not copy Google-native contents, parts, or generateContent request shapes directly. The simplest path is to keep using /v1/chat/completions.

Recommended endpoint POST https://token.qyuanai.com/v1/chat/completions

Supported models gemini-2.5-*、gemini-3-*、gemini-3.1-*

Minimum payload model and messages, with optional max_tokens and stream

Gemini text model curl example

curl https://token.qyuanai.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [
      {"role": "user", "content": "Reply with ok only."}
    ],
    "max_tokens": 64
  }'

Gemini image-capable model test example

On this platform, gemini-3.1-flash-image is currently best tested through the same chat/completions format first.

curl https://token.qyuanai.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3.1-flash-image",
    "messages": [
      {"role": "user", "content": "Reply with ok only."}
    ],
    "max_tokens": 64
  }'

Image generation

If you want direct image generation, do not call /v1/chat/completions. Use the dedicated /v1/images/generations endpoint instead.

Default payload: model + prompt + size + n. The recommended pure image model right now is gpt-image-2.

curl example

curl https://token.qyuanai.com/v1/images/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-image-2",
    "prompt": "A minimalist product poster featuring a blue cube, white background, studio lighting",
    "n": 1,
    "size": "1024x1024"
  }'

Python example

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://token.qyuanai.com/v1"
)

result = client.images.generate(
    model="gpt-image-2",
    prompt="A minimalist product poster featuring a blue cube, white background, studio lighting",
    size="1024x1024",
    n=1,
)

print(result.data[0].b64_json[:80])

Streaming

Text models currently support SSE streaming responses. When testing in a terminal, add -N.

curl -N https://token.qyuanai.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4-mini",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Output three lines: hello, qyuan, ai"}
    ]
  }'

The response is a standard data: {...} event stream and ends with [DONE].

Responses API

If your SDK or app has already moved to OpenAI's newer unified interface, you can call /v1/responses directly.

Default payload: model + input + optional max_output_tokens.

curl https://token.qyuanai.com/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4-mini",
    "input": "Reply with ok only.",
    "max_output_tokens": 32
  }'

If your project already relies heavily on chat.completions, there is no need to migrate to responses just because it is newer. Both are currently supported. Choose based on your existing code structure.

Claude Messages

If your current Claude integration already follows the Anthropic format, you can keep using it here.

Default payload: model + max_tokens + messages， and the auth header must be x-api-key, not Bearer.

curl https://token.qyuanai.com/v1/messages \
  -H "x-api-key: YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

We currently recommend claude-sonnet-4-5 or claude-sonnet-4-6 for most Claude usage. If you prefer to keep one shared integration style, you can also call Claude models through OpenAI-compatible /v1/chat/completions.

Referral rewards

To avoid ambiguity, the current referral reward policy is defined as follows:

WeCom support withdrawal notes

Referral rewards are accumulated inside the console. In-site usage and withdrawal follow the rules below.

The referrer receives 5% of the invited user's top-up amount.

Rewards are visible in the console and can be transferred into in-site balance for usage.

After cumulative referral earnings reach 100 USD, you can add WeCom support to request a withdrawal.

Scan to add WeCom support Hanson. Once you reach the withdrawal threshold, support can help process the withdrawal.

Not recommended right now

To keep this documentation aligned with the platform's actual production-ready capabilities, the following are intentionally not covered here:

/v1/embeddings: there is currently no active embedding channel.
Audio, files, Assistants, fine-tuning, and other interfaces are not currently documented as primary external capabilities.

FAQ

1. What should I use as the Base URL?

For OpenAI-compatible SDKs, use https://token.qyuanai.com/v1.

2. Why do I get a “model not found” error?

Always trust the result of GET /v1/models. Do not reuse model names from other platforms.

3. Do Claude models have to use `/v1/messages`?

No. You can also use /v1/chat/completions as long as the model name is a Claude model.

4. Why can’t I copy Google-native Gemini examples directly?

Because Gemini is normalized into an OpenAI-compatible interface on this platform. The recommended path is /v1/chat/completions with messages.

5. Which endpoint should I use for image generation?

Use /v1/images/generations for pure image generation. The currently recommended model is gpt-image-2.

6. How can I verify that my API key works?

The simplest method is to call GET /v1/models. If it returns a model list, authentication is working.

Unified LLM API access for production

Quickstart

Interface overview

Model categories

OpenAI Chat Completions

curl example

Python example

JavaScript example

Gemini integration

Gemini text model curl example

Gemini image-capable model test example

Image generation

curl example

Python example

Streaming

Responses API

Claude Messages

Referral rewards

Not recommended right now

FAQ

1. What should I use as the Base URL?

2. Why do I get a “model not found” error?

3. Do Claude models have to use /v1/messages?

4. Why can’t I copy Google-native Gemini examples directly?

5. Which endpoint should I use for image generation?

6. How can I verify that my API key works?

3. Do Claude models have to use `/v1/messages`?