QY QYUAN AI Unified API access guide for production use

Unified LLM API access for production

This page only keeps the QYUAN AI endpoints that are currently supported and verified in production. Every example below already uses your own base URL: https://token.qyuanai.com.

OpenAI compatible Claude Messages compatible Streaming supported Docs verified: 2026-05-15

Quickstart

If you already use the OpenAI SDK, start here first. This is the lowest-friction migration path.

You can think of QYUAN AI as a unified API gateway. Right now there are two recommended integration patterns:

  • OpenAI-compatible: best for most SDKs and apps, using /v1/chat/completions or /v1/responses.
  • Claude Messages: if your existing code already follows the Anthropic format, you can call /v1/messages directly.
OpenAI Base URL https://token.qyuanai.com/v1
Claude endpoint https://token.qyuanai.com/v1/messages
Authentication Use Authorization: Bearer YOUR_API_KEY for OpenAI-compatible endpoints. Use x-api-key: YOUR_API_KEY for Claude Messages.
Create an API key in the console before making requests. Do not guess model names manually. Start with GET /v1/models to fetch the currently available model list.

Interface overview

Models on this platform no longer share a single request shape. The most common integration mistake is this: most text models can use the OpenAI-compatible interface, but image generation and Claude native requests require different payload formats.

Model type Recommended endpoint Core payload shape
GPT / Codex / Claude / Gemini text models POST /v1/chat/completions model + messages + max_tokens
Gemini models through this gateway POST /v1/chat/completions Still use OpenAI-style messages. Do not switch to Google-native contents.
Pure image generation POST /v1/images/generations model + prompt + size + n
OpenAI Responses-style apps POST /v1/responses model + input + max_output_tokens
Claude native format POST /v1/messages model + max_tokens + messages with x-api-key
GET /v1/models List currently available models Recommended before every new integration

Model categories

As of 2026-05-15, the models returned by the API can be understood like this:

Category Current models Recommended integration
OpenAI / Codex text models gpt-5.2gpt-5.3-codexgpt-5.3-codex-sparkgpt-5.4gpt-5.4-minigpt-5.5gpt-oss-120b-medium /v1/chat/completions or /v1/responses
Claude models claude-sonnet-4-5claude-sonnet-4-6claude-opus-4-6claude-opus-4-6-thinkingclaude-opus-4-7 /v1/chat/completions or /v1/messages
Gemini text / reasoning models gemini-2.5-flashgemini-2.5-flash-litegemini-2.5-progemini-3-flashgemini-3-flash-previewgemini-3-pro-lowgemini-3-pro-highgemini-3-pro-previewgemini-3.1-flash-litegemini-3.1-flash-lite-previewgemini-3.1-pro-lowgemini-3.1-pro-highgemini-3.1-pro-preview /v1/chat/completions
Image models gpt-image-2 /v1/images/generations
Gemini image-capable models gemini-3.1-flash-image For now, test and integrate it through /v1/chat/completions

The safest pattern is still to request GET /v1/models first and read model names directly from the response.

curl https://token.qyuanai.com/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

OpenAI Chat Completions

This is the most universal integration path right now. GPT, Claude, and Gemini models can all be tested through this format first.

Default payload: model + messages + optional max_tokens / stream

curl example

curl https://token.qyuanai.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4-mini",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Introduce QYUAN AI in one sentence."}
    ],
    "max_tokens": 512
  }'

Python example

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://token.qyuanai.com/v1"
)

resp = client.chat.completions.create(
    model="gpt-5.4-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Introduce QYUAN AI in one sentence."}
    ],
    max_tokens=512,
)

print(resp.choices[0].message.content)

JavaScript example

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.QYUAN_API_KEY,
  baseURL: "https://token.qyuanai.com/v1",
});

const resp = await client.chat.completions.create({
  model: "gpt-5.4-mini",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Introduce QYUAN AI in one sentence." }
  ],
  max_tokens: 512
});

console.log(resp.choices[0].message.content);

Gemini integration

This model family is already normalized into the OpenAI-compatible format on this platform, so do not copy Google-native contents, parts, or generateContent request shapes directly. The simplest path is to keep using /v1/chat/completions.

Recommended endpoint POST https://token.qyuanai.com/v1/chat/completions
Supported models gemini-2.5-*gemini-3-*gemini-3.1-*
Minimum payload model and messages, with optional max_tokens and stream

Gemini text model curl example

curl https://token.qyuanai.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [
      {"role": "user", "content": "Reply with ok only."}
    ],
    "max_tokens": 64
  }'

Gemini image-capable model test example

On this platform, gemini-3.1-flash-image is currently best tested through the same chat/completions format first.

curl https://token.qyuanai.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3.1-flash-image",
    "messages": [
      {"role": "user", "content": "Reply with ok only."}
    ],
    "max_tokens": 64
  }'

Image generation

If you want direct image generation, do not call /v1/chat/completions. Use the dedicated /v1/images/generations endpoint instead.

Default payload: model + prompt + size + n. The recommended pure image model right now is gpt-image-2.

curl example

curl https://token.qyuanai.com/v1/images/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-image-2",
    "prompt": "A minimalist product poster featuring a blue cube, white background, studio lighting",
    "n": 1,
    "size": "1024x1024"
  }'

Python example

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://token.qyuanai.com/v1"
)

result = client.images.generate(
    model="gpt-image-2",
    prompt="A minimalist product poster featuring a blue cube, white background, studio lighting",
    size="1024x1024",
    n=1,
)

print(result.data[0].b64_json[:80])

Streaming

Text models currently support SSE streaming responses. When testing in a terminal, add -N.

curl -N https://token.qyuanai.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4-mini",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Output three lines: hello, qyuan, ai"}
    ]
  }'

The response is a standard data: {...} event stream and ends with [DONE].

Responses API

If your SDK or app has already moved to OpenAI's newer unified interface, you can call /v1/responses directly.

Default payload: model + input + optional max_output_tokens.
curl https://token.qyuanai.com/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4-mini",
    "input": "Reply with ok only.",
    "max_output_tokens": 32
  }'
If your project already relies heavily on chat.completions, there is no need to migrate to responses just because it is newer. Both are currently supported. Choose based on your existing code structure.

Claude Messages

If your current Claude integration already follows the Anthropic format, you can keep using it here.

Default payload: model + max_tokens + messages, and the auth header must be x-api-key, not Bearer.
curl https://token.qyuanai.com/v1/messages \
  -H "x-api-key: YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

We currently recommend claude-sonnet-4-5 or claude-sonnet-4-6 for most Claude usage. If you prefer to keep one shared integration style, you can also call Claude models through OpenAI-compatible /v1/chat/completions.

Referral rewards

To avoid ambiguity, the current referral reward policy is defined as follows:

WeCom support withdrawal notes

Referral rewards are accumulated inside the console. In-site usage and withdrawal follow the rules below.

The referrer receives 5% of the invited user's top-up amount.
Rewards are visible in the console and can be transferred into in-site balance for usage.
After cumulative referral earnings reach 100 USD, you can add WeCom support to request a withdrawal.
WeCom support Hanson QR code Scan to add WeCom support Hanson. Once you reach the withdrawal threshold, support can help process the withdrawal.

Not recommended right now

To keep this documentation aligned with the platform's actual production-ready capabilities, the following are intentionally not covered here:

  • /v1/embeddings: there is currently no active embedding channel.
  • Audio, files, Assistants, fine-tuning, and other interfaces are not currently documented as primary external capabilities.

FAQ

1. What should I use as the Base URL?

For OpenAI-compatible SDKs, use https://token.qyuanai.com/v1.

2. Why do I get a “model not found” error?

Always trust the result of GET /v1/models. Do not reuse model names from other platforms.

3. Do Claude models have to use /v1/messages?

No. You can also use /v1/chat/completions as long as the model name is a Claude model.

4. Why can’t I copy Google-native Gemini examples directly?

Because Gemini is normalized into an OpenAI-compatible interface on this platform. The recommended path is /v1/chat/completions with messages.

5. Which endpoint should I use for image generation?

Use /v1/images/generations for pure image generation. The currently recommended model is gpt-image-2.

6. How can I verify that my API key works?

The simplest method is to call GET /v1/models. If it returns a model list, authentication is working.