Stage 0 — CPT (Domain Pretrain)
Purpose: build Russian real-estate domain embeddings before SFT.
Data: “All raw Russian text we have” (property listings, chats, docs).
Language/Domain: Russian, real-estate operations.
Context: long-context ready (multi-turn dialogs, long docs).
Output style: neutral, factual, tool-agnostic.
Intended use: foundation for downstream stages.
Known limits: not instruction-aligned; may produce generic answers.
Safety: filtered for PII where possible; still requires downstream guardrails.
- Downloads last month
- 2