π₯ Distilling Tiny Embeddings. We're happy to build on the BERT Hash Series of models with this new set of fixed dimensional tiny embeddings models.
Ranging from 244K parameters to 970K and 50 dimensions to 128 dimensions these tiny models pack quite a punch.
Use cases include on-device semantic search, similarity comparisons, LLM chunking and Retrieval Augmented Generation (RAG). The advantage is that data never needs to leave the device while still having solid performance.
Our engineer Alan from https://robonine.com team has assembled the mechanical frame of our 6-DoF manipulator prototype - without servo motors for now. At this stage we are evaluating how easy the structure is to assemble, checking for any mechanical play, and validating the kinematics.
Good news: the structure feels solid and Alan reports no detectable backlash so far.
Introducing Hawky-AI H1 4B PM: The First Open-Source LLM for Performance Marketing π―
Hey HF Community! π
Just released the first LLM fine-tuned specifically for Performance Marketing. What is it? Gemma 3 4B distilled from Claude Opus 4.5 with expert-level marketing knowledge. Covers: π± Meta Ads (campaign structure, bidding, scaling, creative fatigue) π Google Ads (Quality Score, Performance Max, lead gen) π Measurement (ROAS vs MER, incrementality, LTV:CAC) π¨ Creative Strategy (hook rates, A/B testing, funnel creative) Why we built it: Generic LLMs say "optimize your targeting" β not helpful. This model gives specific frameworks like "frequency at 4.5 + CTR drop = creative fatigue, here's the fix..." Technical:
Base: Gemma 3 4B Method: QLoRA (r=64) Teacher: Claude Opus 4.5
- Generator: create your own dataset from scratch - Converter: use existing datasets (Hugging Face support) with reasoning traces to match our SYNTH style - DEEP Mode: multiple agents working together in various configurations - Multi-turn Support: pass one DEEP run, let the model ask follow-up questions, and choose who should respond using SYNTH-like thinking - Firebase/Firestore: download your data directly as a JSONL file or upload it to your Firestore (production mode) - Data Preview: Have data but unsure what's inside? Explore it directly! - Verifier View: evaluate generated data, remove duplicates, assign ratings
Introducing Gliese-OCR-7B-Post1.0, a document content-structure retrieval VLM designed for content extraction(OCRs) and summarization. This is the third model in the Camel Doc OCR VLM series, following Camel-Doc-OCR-062825. The new version fixes formal table reconstruction issues in both En and Zh, achieving optimal performance for long-context inferences. This model also shows significant improvements in LaTeX and Markdown rendering for OCR tasks.
π Big news! NeuroBLAST, the outstanding new architecture, has officially arrived on HF! After three intense months of training my 1.9 billion SLM on my trusty RTX 3090 Ti, Iβm happy to announce the results. While itβs not perfect just yet, Iβve dedicated countless hours to optimizing costs while crafting clever layer connections that mimic the brain's centers. Plus, Iβve introduced a new memory-like layer thatβs sure to turn heads! I canβt wait to dive deep into this journey in my upcoming blog post. Stay tuned for the full scoop! π₯
As we always use Transformers, it's helpful to understand RoPEβRotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on.
Here are 8 types of RoPE that can be implemented in different cases:
4. Multimodal RoPE (MRoPE) -> Qwen2.5-VL Technical Report (2502.13923) Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos.
8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10 Introduces an exponential decay factor into the rotation matrixβ, improving stability on long sequences.
Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.
Here is a list of 15 types of attention mechanisms used in AI models:
3. Self-attention -> Attention Is All You Need (1706.03762) Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.
5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762) Multiple attention βheadsβ are run in parallel.β The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.
π New smolagents update: Safer Local Python Execution! π¦Ύπ
With the latest release, we've added security checks to the local Python interpreter: every evaluation is now analyzed for dangerous builtins, modules, and functions. π
Here's why this matters & what you need to know! π§΅π
1οΈβ£ Why is local execution risky? β οΈ AI agents that run arbitrary Python code can unintentionally (or maliciously) access system files, run unsafe commands, or exfiltrate data.
2οΈβ£ New Safety Layer in smolagents π‘οΈ We now inspect every return value during execution: β Allowed: Safe built-in types (e.g., numbers, strings, lists) β Blocked: Dangerous functions/modules (e.g., os.system, subprocess, exec, shutil)
4οΈβ£ Security Disclaimer β οΈ π¨ Despite these improvements, local Python execution is NEVER 100% safe. π¨ If you need true isolation, use a remote sandboxed executor like Docker or E2B.
5οΈβ£ The Best Practice: Use Sandboxed Execution π For production-grade AI agents, we strongly recommend running code in a Docker or E2B sandbox to ensure complete isolation.
6οΈβ£ Upgrade Now & Stay Safe! π Check out the latest smolagents release and start building safer AI agents today.
βοΈ modification of the cross-entropy loss function designed specifically for training LLMs. βοΈ twist on the standard cross-entropy loss by emphasizing the importance of outlier prediction errors and dynamically normalizing token-level variance. βοΈ more stable and efficient training, leading to models that generalize better.
Check it out, give it a spin, and let me know what you think!
Licensed under the Apache 2.0 license and ready to use. Happy training! π₯π€
made a few improvements on custom grpo trainer: - added sequence similarity reward (seems to work) - improved vllm support (5x inference speed) - adjusted reward scores (this helped with format/accuracy) - can now push to hf hub (already pushed mine lol: Jaward/smollm2_360m_grpo_gsm8k_reasoner)