API · v1
neonproxy.space documentation
Every frontier model, one endpoint.
Drop-in OpenAI-compatible gateway routing to Anthropic, OpenAI and Zhipu upstreams. Pay per token. No subscription, no rate-limit theatrics.
1. Get an API key →
Sign up, name a key, copy sk-...
2. Drop it in →
Point any OpenAI SDK at neonproxy.space/v1
3. Ship →
Stream responses, swap models, watch usage live
Create an account
- Go to https://neonproxy.space/sign-up
- Pick a username and password.
- Verify your email (if email verification is enabled by the operator).
- Log in at https://neonproxy.space/sign-in — you'll land on the dashboard.
Get an API key
- Open Keys in the sidebar (
/console/keys) - Click New key.
- Optionally cap the per-key spend (e.g.
$200 / month) and restrict it to specific models. - Copy the key —
sk-.... It's only shown once.
Store it in your environment:
export NEONPROXY_KEY=sk-...
Drop it into your code
The gateway speaks the OpenAI wire format out of the box — point any OpenAI-compatible SDK at it and your existing code keeps working.
Node / TypeScript
import OpenAI from 'openai'
const ai = new OpenAI({
baseURL: 'https://neonproxy.space/v1',
apiKey: process.env.NEONPROXY_KEY,
})
const res = await ai.chat.completions.create({
model: 'claude-opus-4.8',
messages: [{ role: 'user', content: 'Explain rate limits in one sentence.' }],
})
console.log(res.choices[0].message.content)
Python
from openai import OpenAI
import os
client = OpenAI(
base_url="https://neonproxy.space/v1",
api_key=os.environ["NEONPROXY_KEY"],
)
res = client.chat.completions.create(
model="claude-sonnet-4.6",
messages=[{"role": "user", "content": "Hi"}],
)
print(res.choices[0].message.content)
curl
curl https://neonproxy.space/v1/chat/completions \
-H "Authorization: Bearer $NEONPROXY_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [{"role": "user", "content": "Hi"}]
}'
Streaming
Every model supports streamed responses. Add "stream": true:
const stream = await ai.chat.completions.create({
model: 'claude-opus-4.7',
messages,
stream: true,
})
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '')
}
Anthropic / Gemini wire formats
If your code uses the Anthropic SDK directly:
import Anthropic from '@anthropic-ai/sdk'
const client = new Anthropic({
baseURL: 'https://neonproxy.space',
apiKey: process.env.NEONPROXY_KEY,
})
const msg = await client.messages.create({
model: 'claude-opus-4.8',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Hi' }],
})
Gemini-style endpoints work too:
POST https://neonproxy.space/v1beta/models/gemini-2.5-pro:generateContent
Models & pricing
All prices in USD per 1M tokens. No subscription, no monthly minimum. Billing on actual usage.
| Model | Input | Output | Context |
|---|---|---|---|
| claude-opus-4.8 | $0.40 | $0.40 | 200K |
| claude-opus-4.7 | $0.40 | $0.40 | 200K |
| claude-sonnet-4.6 | $0.20 | $0.20 | 200K |
| gpt-5.5 | $0.20 | $0.20 | 128K |
| gpt-5.4 | $0.20 | $0.20 | 128K |
| glm-5.2 | $0.10 | $0.10 | 128K |
More code examples
Tool use (function calling)
const res = await ai.chat.completions.create({
model: 'claude-opus-4.8',
messages: [{ role: 'user', content: 'Weather in Berlin?' }],
tools: [{
type: 'function',
function: {
name: 'get_weather',
description: 'Get current weather',
parameters: {
type: 'object',
properties: { city: { type: 'string' } },
required: ['city'],
},
},
}],
})
if (res.choices[0].message.tool_calls) {
// Pass tool results back as a follow-up message
}
Vision (multimodal input)
const res = await ai.chat.completions.create({
model: 'claude-sonnet-4.6',
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'What is in this image?' },
{ type: 'image_url', image_url: { url: 'https://example.com/cat.jpg' } },
],
}],
})
JSON-mode response
const res = await ai.chat.completions.create({
model: 'gpt-5.5',
response_format: { type: 'json_object' },
messages: [
{ role: 'system', content: 'Always respond with JSON.' },
{ role: 'user', content: 'List 3 fruits.' },
],
})
const data = JSON.parse(res.choices[0].message.content)
Embeddings (where supported)
const res = await ai.embeddings.create({
model: 'text-embedding-3-large',
input: 'hello world',
})
console.log(res.data[0].embedding)
FAQ
What providers do you route to?
Anthropic for claude-*, OpenAI for gpt-*, Zhipu for glm-*. Region failover handled transparently.
Is there a rate limit?
Per-key rate limits are configurable in the dashboard. The default is generous enough for most production traffic; you only hit a ceiling if you set one.
What about latency?
Routing adds ~5–15 ms on top of the upstream provider's latency. For streamed responses the first token arrives at the same wallclock as calling the provider directly.
Can I bring my own API keys (BYOK)?
Contact support — BYOK is supported for enterprise accounts.
Where can I see usage?
Dashboard at neonproxy.space/console/dashboard — real-time spend, per-key breakdown, per-model token counts.
What if a model returns an error?
The gateway retries transient errors automatically. Persistent errors propagate to your client with the upstream's error code unchanged.
Refunds / SLAs?
See User Agreement.
Support
- Email: support@neonproxy.space
- Status: neonproxy.space/api/status
- Source: QuantumNous/new-api (AGPL v3.0)
Last updated 2026-06-22. This site is operated under neonproxy.space terms.