GigaChat3-10B-A1.8B
ะัะตะดััะฐะฒะปัะตะผ GigaChat3-10B-A1.8B โ ะดะธะฐะปะพะณะพะฒัั ะผะพะดะตะปั ัะตะผะตะนััะฒะฐ GigaChat. ะะพะดะตะปั ะพัะฝะพะฒะฐะฝะฐ ะฝะฐ ะฐัั
ะธัะตะบัััะต Mixture-of-Experts (MoE) ั 10B ะพะฑัะธั
ะธ 1.8B ะฐะบัะธะฒะฝัั
ะฟะฐัะฐะผะตััะพะฒ.
ะัั
ะธัะตะบัััะฐ ะฒะบะปััะฐะตั Multi-head Latent Attention (MLA) ะธ Multi-Token Prediction (MTP), ะทะฐ ััะตั ัะตะณะพ ะผะพะดะตะปั ะพะฟัะธะผะธะทะธัะพะฒะฐะฝะฐ ะดะปั ะฒััะพะบะพะน ะฟัะพะฟััะบะฝะพะน ัะฟะพัะพะฑะฝะพััะธ (throughput) ะฟัะธ ะธะฝัะตัะตะฝัะต.
ะะพะดะตะปั ะพะฑััะตะฝะฐ ะฟะพะฒะตัั
ะฝะฐัะตะน ะฑะฐะทะพะฒะพะน ะฒะตััะธะธ (GigaChat3-10B-A1.8B-base) ั ะฟะพะผะพััั ะฒััะพะบะพะบะฐัะตััะฒะตะฝะฝัั
SFT-ะดะฐะฝะฝัั
.
ะะฐะฝะฝะฐั ะฒะตััะธั ะฟัะตะดะฝะฐะทะฝะฐัะตะฝะฐ ะดะปั ะฒััะพะบะพะฟัะพะธะทะฒะพะดะธัะตะปัะฝะพะณะพ ะธะฝัะตัะตะฝัะฐ ะฒ fp8, ะผะพะดะตะปั ะฒ bf16 โ GigaChat3-10B-A1.8B.
ะะพะปััะต ะฟะพะดัะพะฑะฝะพััะตะน ะฒ ั
ะฐะฑั ััะฐััะต.
ะัั ะธัะตะบัััะฐ ะผะพะดะตะปะธ
GigaChat3-10B-A1.8B ะธัะฟะพะปัะทัะตั ะบะฐััะพะผะฝัั MoE-ะฐัั
ะธัะตะบัััั:
Multi-head Latent Attention (MLA)
ะะผะตััะพ ััะฐะฝะดะฐััะฝะพะณะพ Multi-head Attention ะผะพะดะตะปั ะธัะฟะพะปัะทัะตั MLA. MLA ะพะฑะตัะฟะตัะธะฒะฐะตั ัััะตะบัะธะฒะฝัะน ะธะฝัะตัะตะฝั ะทะฐ ััะตั ัะถะฐัะธั Key-Value (KV) ะบััะฐ ะฒ ะปะฐัะตะฝัะฝัะน ะฒะตะบัะพั, ััะพ ะทะฝะฐัะธัะตะปัะฝะพ ัะฝะธะถะฐะตั ััะตะฑะพะฒะฐะฝะธั ะบ ะฟะฐะผััะธ ะธ ััะบะพััะตั ะพะฑัะฐะฑะพัะบั.
Multi-Token Prediction (MTP)
ะะพะดะตะปั ะพะฑััะตะฝะฐ ั ะธัะฟะพะปัะทะพะฒะฐะฝะธะตะผ ะทะฐะดะฐัะธ Multi-Token Prediction (MTP). ะญัะพ ะฟะพะทะฒะพะปัะตั ะผะพะดะตะปะธ ะฟัะตะดัะบะฐะทัะฒะฐัั ะฝะตัะบะพะปัะบะพ ัะพะบะตะฝะพะฒ ะทะฐ ะพะดะธะฝ ะฟัะพั ะพะด, ััะพ ััะบะพััะตั ะณะตะฝะตัะฐัะธั ะดะพ 40% ั ะฟะพะผะพััั ัะตั ะฝะธะบ ัะฟะตะบัะปััะธะฒะฝะพะน/ะฟะฐัะฐะปะปะตะปัะฝะพะน ะณะตะฝะตัะฐัะธะธ.
ะะฐะฝะฝัะต ะดะปั ะพะฑััะตะฝะธั
ะะพะดะตะปั ะพะฑััะตะฝะฐ ะฝะฐ 20ะข ัะพะบะตะฝะพะฒ. ะั ะดะพะฑะฐะฒะธะปะธ 10 ัะทัะบะพะฒ โ ะพั ะบะธัะฐะนัะบะพะณะพ ะธ ะฐัะฐะฑัะบะพะณะพ ะดะพ ัะทะฑะตะบัะบะพะณะพ ะธ ะบะฐะทะฐั ัะบะพะณะพ, ะฐ ัะฐะบะถะต ัะฐััะธัะธะปะธ ะฝะฐะฑะพั ะธััะพัะฝะธะบะพะฒ: ะบะฝะธะณะธ, ะฐะบะฐะดะตะผะธัะตัะบะธะต ะดะฐะฝะฝัะต, ะดะฐัะฐัะตัั ะฟะพ ะบะพะดั ะธ ะผะฐัะตะผะฐัะธะบะต. ะัะต ะดะฐะฝะฝัะต ะฟัะพั ะพะดัั ะดะตะดัะฟะปะธะบะฐัะธั, ัะทัะบะพะฒัั ัะธะปัััะฐัะธั ะธ ะฐะฒัะพะผะฐัะธัะตัะบะธะต ะฟัะพะฒะตัะบะธ ะบะฐัะตััะฒะฐ ะฟัะธ ะฟะพะผะพัะธ ัะฒัะธััะธะบ ะธ ะบะปะฐััะธัะธะบะฐัะพัะพะฒ. ะะปััะตะฒะพะน ะฒะบะปะฐะด ะฒ ะบะฐัะตััะฒะพ ะฒะฝะตัะปะฐ ัะธะฝัะตัะธะบะฐ: ะผั ัะณะตะฝะตัะธัะพะฒะฐะปะธ ะพะบะพะปะพ 5,5 ััะธะปะปะธะพะฝะพะฒ ัะพะบะตะฝะพะฒ ัะธะฝัะตัะธัะตัะบะธั ะดะฐะฝะฝัั . ะ ะบะพัะฟัั ะฒั ะพะดัั ะฒะพะฟัะพัั-ะพัะฒะตัั ะบ ัะตะบััะฐะผ, ัะตะฟะพัะบะธ reverse-prompt ะดะปั ััััะบัััะธัะพะฒะฐะฝะธั ะดะฐะฝะฝัั , LLM-ะทะฐะผะตัะบะธ ั ะบะพะผะผะตะฝัะฐัะธัะผะธ ะพั ะผะพะดะตะปะธ ะฒะฝัััะธ ัะตะบััะพะฒ, ะผะธะปะปะธะพะฝั ัะธะฝัะตัะธัะตัะบะธั ะทะฐะดะฐั ั ัะตัะตะฝะธัะผะธ ะฟะพ ะผะฐัะตะผะฐัะธะบะต ะธ ะพะปะธะผะฟะธะฐะดะฝะพะผั ะฟัะพะณัะฐะผะผะธัะพะฒะฐะฝะธั (ั ัะธะฝัะตัะธัะตัะบะธะผะธ ัะตััะฐะผะธ) ะฝะฐ ะพัะฝะพะฒะต PromptCot.
ะะฝัะตัะตะฝั
ะะดะฝะพ ะธะท ะบะปััะตะฒัั
ะฟัะตะธะผััะตััะฒ GigaChat3-10B-A1.8B โ ัะบะพัะพััั ะธะฝัะตัะตะฝัะฐ. ะะพะดะตะปั (ะพัะพะฑะตะฝะฝะพ ะฒ ัะตะถะธะผะต MTP) ะดะตะผะพะฝัััะธััะตั ะฟัะพะฟััะบะฝัั ัะฟะพัะพะฑะฝะพััั, ัะพะฟะพััะฐะฒะธะผัั ั ะฟัะพะฟััะบะฝะพะน ัะฟะพัะพะฑะฝะพัััั ะทะฝะฐัะธัะตะปัะฝะพ ะผะตะฝััะธั
denseโะผะพะดะตะปะตะน.
ะั ะธะทะผะตััะปะธ ั ะฟะพะผะพััั vLLM v0.11.0, ะฝะฐ ัะธะฟะต bfloat16 c batch_size=1.
ะกััะปะบะฐ ะฝะฐ ะบะพะด.
| ะะพะดะตะปั | request_throughput | output_throughput | total_token_throughput | mean_ttft_ms |
|---|---|---|---|---|
Qwen3-1.7B |
1.689 | 357.308 | 726.093 | 11.824 |
mtp-GigaChat3-10B-A1.8B-base |
1.533 | 333.620 | 678.894 | 26.345 |
GigaChat3-10B-A1.8B-base |
1.077 | 234.363 | 476.912 | 31.053 |
Qwen3-4B |
0.978 | 206.849 | 420.341 | 14.947 |
Qwen3-8B |
0.664 | 140.432 | 285.375 | 16.663 |
YandexGPT-5-Lite-8B-pretrain |
0.641 | 147.305 | 300.269 | 16.711 |
ะะตะฝัะผะฐัะบะธ
ะฅะพัั ะผะพะดะตะปั ะธะผะตะตั 10 ะผะธะปะปะธะฐัะดะพะฒ ะฟะฐัะฐะผะตััะพะฒ, ะตั ะฟััะผัะต ะฐะฝะฐะปะพะณะธ โ ะผะพะดะตะปะธ ัะฐะทะผะตัะพะผ 3โ4 ะผะธะปะปะธะฐัะดะฐ ะฟะฐัะฐะผะตััะพะฒ. ะะดะฝะฐะบะพ ะฑะปะฐะณะพะดะฐัั ะฒััะพะบะพะน ัะบะพัะพััะธ ะณะตะฝะตัะฐัะธะธ ะผั ัะฐะบะถะต ััะฐะฒะฝะธะฒะฐะตะผ ะตั ั ะตัั ะฑะพะปะตะต ะบะพะผะฟะฐะบัะฝัะผะธ ะผะพะดะตะปัะผะธ.
| ะะตััะธะบะฐ | GigaChat 3 Lightning | Qwen3-1.7B-Instruct | Qwen3-4B-Instruct-2507 | SmolLM3 |
|---|---|---|---|---|
| MMLU_RU_FIVE_SHOT | 0.6833 | 0.4876 | 0.5972 | 0.4998 |
| RUBQ_ZERO_SHOT | 0.6516 | 0.2557 | 0.3170 | 0.6363 |
| MMLU_PRO_EN_FIVE_SHOT | 0.6061 | 0.410 | 0.6849 | 0.5013 |
| MMLU_EN_FIVE_SHOT | 0.7403 | 0.60 | 0.7080 | 0.5992 |
| BBH_THREE_SHOT | 0.4525 | 0.3317 | 0.7165 | 0.4161 |
| SuperGPQA | 0.2731 | 0.2092 | 0.3745 | 0.2459 |
| MATH_500_FOUR_SHOT | 0.7000 | 0.7520 | 0.8880 | 0.8020 |
| GPQA_COT_ZERO_SHOT | 0.3502 | 0.2651 | 0.5370 | 0.3704 |
| LiveCodeBench_ZERO_SHOT | 0.2031 | 0.0794 | 0.3046 | 0.1656 |
| HUMAN_EVAL_PLUS_ZERO_SHOT | 0.6951 | 0.6280 | 0.8780 | 0.7012 |
ะะฐะบ ะฟัะพะฒะตัะธัั ะผะตััะธะบะธ ะผะพะดะตะปะธ
# lm-eval[api]==0.4.9.1
# sglang[all]==0.5.5
# ะธะปะธ
# vllm==0.11.2
export HF_ALLOW_CODE_EVAL=1
# sglang server up
# 10B
python -m sglang.launch_server --model-path <path_to_model> --host 127.0.0.1 --port 30000 --dtype auto --mem-fraction-static 0.88 --trust-remote-code --allow-auto-truncate --speculative-algorithm EAGLE --speculative-num-steps 1 --speculative-eagle-topk 1 --speculative-num-draft-tokens 2
# mmlu pro check
python -m lm_eval --model sglang-generate --output_path <path_to_model> --batch_size 16 --model_args base_url=http://127.0.0.1:30000/generate,num_concurrent=16,tokenized_requests=True,max_length=131072,tokenizer=<path_to_model> --trust_remote_code --confirm_run_unsafe_code --num_fewshot 5 --tasks mmlu_pro
ะัะธะผะตั ะธัะฟะพะปัะทะพะฒะฐะฝะธั (Quickstart)
1. transformers
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
model_name = "ai-sage/GigaChat3-10B-A1.8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
model.generation_config = GenerationConfig.from_pretrained(model_name)
messages = [
{"role": "user", "content": "ะะพะบะฐะถะธ ัะตะพัะตะผั ะพ ะฝะตะฟะพะดะฒะธะถะฝะพะน ัะพัะบะต"}
]
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=1000)
result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=False)
print(result)
2. vLLM
ะะฐะฟััะบ ัะตัะฒะตัะฐ
# VLLM DeepGemm conflicts with our hidden dim size.
# Fix: Disable it via env var (VLLM_USE_DEEP_GEMM=0).
VLLM_USE_DEEP_GEMM=0 vllm serve ai-sage/GigaChat3-10B-A1.8B \
--dtype "auto" \
--speculative-config '{"method": "mtp", "num_speculative_tokens": 1, "disable_padded_drafter_batch": false}'
ะัะธะผะตั ะทะฐะฟัะพัะฐ
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai-sage/GigaChat3-10B-A1.8B",
"messages": [
{
"role": "user",
"content": "ะะพะบะฐะถะธ ัะตะพัะตะผั ะพ ะฝะตะฟะพะดะฒะธะถะฝะพะน ัะพัะบะต"
}
],
"max_tokens": 400,
"temperature": 0
}'
3. SGLang
ะะฐะฟััะบ ัะตัะฒะตัะฐ
python -m sglang.launch_server \
--model-path ai-sage/GigaChat3-10B-A1.8B \
--host 0.0.0.0 \
--port 30000 \
--dtype auto \
--mem-fraction-static 0.88 \
--speculative-algorithm EAGLE \
--speculative-num-steps 1 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 2
ะัะธะผะตั ะทะฐะฟัะพัะฐ
curl http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai-sage/GigaChat3-10B-A1.8B",
"messages": [
{
"role": "user",
"content": "ะะพะบะฐะถะธ ัะตะพัะตะผั ะพ ะฝะตะฟะพะดะฒะธะถะฝะพะน ัะพัะบะต"
}
],
"max_tokens": 1000,
"temperature": 0
}'
Function call
1. transformers
Click for a dropdown
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import json
import re
REGEX_FUNCTION_CALL_V3 = re.compile(r"function call<\|role_sep\|>\n(.*)$", re.DOTALL)
REGEX_CONTENT_PATTERN = re.compile(r"^(.*?)<\|message_sep\|>", re.DOTALL)
def parse_function_and_content(completion_str: str):
"""
Using the regexes the user provided, attempt to extract function call and content.
Returns (function_call_str_or_None, content_str_or_None)
"""
function_call = None
content = None
m_func = REGEX_FUNCTION_CALL_V3.search(completion_str)
if m_func:
try:
function_call = json.loads(m_func.group(1))
if isinstance(function_call, dict) and "name" in function_call and "arguments" in function_call:
if not isinstance(function_call["arguments"], dict):
function_call = None
else:
function_call = None
except json.JSONDecodeError:
function_call = None
# will return raw string in failed attempt of function calling
return function_call, completion_str
m_content = REGEX_CONTENT_PATTERN.search(completion_str)
if m_content:
content = m_content.group(1)
else:
# as a fallback, everything before the first message_sep marker if present
if "<|message_sep|>" in completion_str:
content = completion_str.split("<|message_sep|>")[0]
else:
content = completion_str
return function_call, content
model_name = "ai-sage/GigaChat3-10B-A1.8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
model.generation_config = GenerationConfig.from_pretrained(model_name)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "ะะพะปััะธัั ะธะฝัะพัะผะฐัะธั ะพ ัะตะบััะตะน ะฟะพะณะพะดะต ะฒ ัะบะฐะทะฐะฝะฝะพะผ ะณะพัะพะดะต.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "ะะฐะทะฒะฐะฝะธะต ะณะพัะพะดะฐ (ะฝะฐะฟัะธะผะตั, ะะพัะบะฒะฐ, ะะฐะทะฐะฝั)."
}
},
"required": ["city"]
}
}
}
]
messages = [
{"role": "user", "content": "ะะฐะบะฐั ัะตะนัะฐั ะฟะพะณะพะดะฐ ะฒ ะะพัะบะฒะต?"}
]
input_tensor = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=1000)
result = parse_function_and_content(tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=False))[0]
print(result)
2. vLLM
ะกะพะฑะตัะธัะต dev ะฒะตััะธั, ะบะพะผะผะธั>=21bb323)
ะะฐะฟััะบ ัะตัะฒะตัะฐ
# VLLM DeepGemm conflicts with our hidden dim size.
# Fix: Disable it via env var (VLLM_USE_DEEP_GEMM=0).
VLLM_USE_DEEP_GEMM=0 vllm serve ai-sage/GigaChat3-10B-A1.8B \
--dtype "auto" \
--speculative-config '{"method": "mtp", "num_speculative_tokens": 1, "disable_padded_drafter_batch": false}' \
--enable-auto-tool-choice \
--tool-call-parser gigachat3
ะัะธะผะตั ะทะฐะฟัะพัะฐ
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai-sage/GigaChat3-10B-A1.8B",
"temperature": 0,
"messages": [
{
"role": "user",
"content": "ะะฐะบะฐั ัะตะนัะฐั ะฟะพะณะพะดะฐ ะฒ ะะพัะบะฒะต?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "ะะพะปััะธัั ะธะฝัะพัะผะฐัะธั ะพ ัะตะบััะตะน ะฟะพะณะพะดะต ะฒ ัะบะฐะทะฐะฝะฝะพะผ ะณะพัะพะดะต.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "ะะฐะทะฒะฐะฝะธะต ะณะพัะพะดะฐ (ะฝะฐะฟัะธะผะตั, ะะพัะบะฒะฐ, ะะฐะทะฐะฝั)."
}
},
"required": ["city"]
}
}
}
]
}'
3. SGLang
ะกะพะฑะตัะธัะต dev ะฒะตััะธั ะฝะฐ ะดะฐะฝะฝะพะน ะฒะตัะบะต - https://github.com/sgl-project/sglang/pull/14765.
ะะฐะฟััะบ ัะตัะฒะตัะฐ
python -m sglang.launch_server \
--model-path ai-sage/GigaChat3-10B-A1.8B \
--host 0.0.0.0 \
--port 30000 \
--dtype auto \
--mem-fraction-static 0.88 \
--speculative-algorithm EAGLE \
--speculative-num-steps 1 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 2
--tool-call-parser gigachat3
ะัะธะผะตั ะทะฐะฟัะพัะฐ
curl http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai-sage/GigaChat3-10B-A1.8B",
"temperature": 0,
"messages": [
{
"role": "user",
"content": "ะะฐะบะฐั ัะตะนัะฐั ะฟะพะณะพะดะฐ ะฒ ะะพัะบะฒะต?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "ะะพะปััะธัั ะธะฝัะพัะผะฐัะธั ะพ ัะตะบััะตะน ะฟะพะณะพะดะต ะฒ ัะบะฐะทะฐะฝะฝะพะผ ะณะพัะพะดะต.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "ะะฐะทะฒะฐะฝะธะต ะณะพัะพะดะฐ (ะฝะฐะฟัะธะผะตั, ะะพัะบะฒะฐ, ะะฐะทะฐะฝั)."
}
},
"required": ["city"]
}
}
}
]
}'
- Downloads last month
- 8,174