Qwen3 14B Claude-Distill
FastAPI proxy inference qua vLLM backend (port 8093). Model Qwen3 14B distilled từ Claude. RTX 5060 Ti 16GB CUDA 3.
Hoạt động
Model mạnh nhất hệ thống.
Tốt cho coding, debugging, reasoning, analysis. OpenAI-compatible + tool calling. Dùng VLLM_API_KEY cho auth.
Base URL
https://pnt.badt.vn/agenticcoder
Authentication
Bearer token (VLLM_API_KEY):
Authorization: Bearer <VLLM_API_KEY>
API Endpoints
POST
/v1/chat/completions
OpenAI Compatible
OpenAI-compatible format, gọi vLLM backend port 8093.
| Param | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | qwen3-claude-distill |
| messages | array | Yes | Message objects |
| max_tokens | int | No | Default: 4096 |
| temperature | float | No | 0.0 - 2.0, default: 0.7 |
| stream | bool | No | SSE streaming |
| tools | array | No | Function/tool definitions |
POST
/chat
Native
Native endpoint, system_prompt + streaming + tool calling.
| Param | Type | Required | Description |
|---|---|---|---|
| messages | array | Yes | Message objects |
| system_prompt | string | No | System instruction |
| max_tokens | int | No | Default: 4096 |
| temperature | float | No | Default: 0.7 |
| stream | bool | No | SSE |
GET
/health
Utility
Service & vLLM health.
GET
/v1/models
OpenAI
List available models.
Thông số kỹ thuật
Modelqwen3-claude-distill (14B)
Context~14K tokens
vLLM Port8093 (CUDA 3)
Proxy Port8045 (FastAPI)
GPURTX 5060 Ti 16GB
AuthVLLM_API_KEY
Khả năng
Coding
Code gen & debug
Reasoning
Deep reasoning
Debug
Troubleshoot
Chat
Đa lượt
Tool Calling
Function calling