In a Training Loop 🔄

32 8 75

Mariusz Kurman PRO

mkurman

AI & ML interests

AI Tech Lead | MD

Recent Activity

reacted to davidmezzetti's post with 👍 about 4 hours ago

🥃 Distilling Tiny Embeddings. We're happy to build on the BERT Hash Series of models with this new set of fixed dimensional tiny embeddings models. Ranging from 244K parameters to 970K and 50 dimensions to 128 dimensions these tiny models pack quite a punch. Use cases include on-device semantic search, similarity comparisons, LLM chunking and Retrieval Augmented Generation (RAG). The advantage is that data never needs to leave the device while still having solid performance. https://huggingface.co/blog/NeuML/bert-hash-embeddings

reacted to branikita's post with 🔥 about 4 hours ago

Our engineer Alan from https://robonine.com team has assembled the mechanical frame of our 6-DoF manipulator prototype - without servo motors for now. At this stage we are evaluating how easy the structure is to assemble, checking for any mechanical play, and validating the kinematics. Good news: the structure feels solid and Alan reports no detectable backlash so far.

reacted to Sri-Vigneshwar-DJ's post with 👀 about 4 hours ago

Introducing Hawky-AI H1 4B PM: The First Open-Source LLM for Performance Marketing 🎯 Hey HF Community! 👋 Just released the first LLM fine-tuned specifically for Performance Marketing. What is it? Gemma 3 4B distilled from Claude Opus 4.5 with expert-level marketing knowledge. Covers: 📱 Meta Ads (campaign structure, bidding, scaling, creative fatigue) 🔍 Google Ads (Quality Score, Performance Max, lead gen) 📊 Measurement (ROAS vs MER, incrementality, LTV:CAC) 🎨 Creative Strategy (hook rates, A/B testing, funnel creative) Why we built it: Generic LLMs say "optimize your targeting" — not helpful. This model gives specific frameworks like "frequency at 4.5 + CTR drop = creative fatigue, here's the fix..." Technical: Base: Gemma 3 4B Method: QLoRA (r=64) Teacher: Claude Opus 4.5 🔗 Model: https://huggingface.co/Sri-Vigneshwar-DJ/hawky-ai-H1-4b-PM Built by Hawky.ai Try it and let us know what you think! 🚀

View all activity

Organizations

reacted to davidmezzetti's post with 👍 about 4 hours ago

Post

436

🥃 Distilling Tiny Embeddings. We're happy to build on the BERT Hash Series of models with this new set of fixed dimensional tiny embeddings models.

Ranging from 244K parameters to 970K and 50 dimensions to 128 dimensions these tiny models pack quite a punch.

Use cases include on-device semantic search, similarity comparisons, LLM chunking and Retrieval Augmented Generation (RAG). The advantage is that data never needs to leave the device while still having solid performance.

https://huggingface.co/blog/NeuML/bert-hash-embeddings

reacted to branikita's post with 🔥 about 4 hours ago

Post

354

Our engineer Alan from https://robonine.com team has assembled the mechanical frame of our 6-DoF manipulator prototype - without servo motors for now. At this stage we are evaluating how easy the structure is to assemble, checking for any mechanical play, and validating the kinematics.

Good news: the structure feels solid and Alan reports no detectable backlash so far.

reacted to Sri-Vigneshwar-DJ's post with 👀 about 4 hours ago

Post

Introducing Hawky-AI H1 4B PM: The First Open-Source LLM for Performance Marketing 🎯

Hey HF Community! 👋

Just released the first LLM fine-tuned specifically for Performance Marketing.
What is it?
Gemma 3 4B distilled from Claude Opus 4.5 with expert-level marketing knowledge.
Covers:
📱 Meta Ads (campaign structure, bidding, scaling, creative fatigue)
🔍 Google Ads (Quality Score, Performance Max, lead gen)
📊 Measurement (ROAS vs MER, incrementality, LTV:CAC)
🎨 Creative Strategy (hook rates, A/B testing, funnel creative)
Why we built it:
Generic LLMs say "optimize your targeting" — not helpful. This model gives specific frameworks like "frequency at 4.5 + CTR drop = creative fatigue, here's the fix..."
Technical:

Base: Gemma 3 4B
Method: QLoRA (r=64)
Teacher: Claude Opus 4.5

🔗 Model: Sri-Vigneshwar-DJ/hawky-ai-H1-4b-PM
Built by Hawky.ai

Try it and let us know what you think! 🚀

posted an update about 4 hours ago

Post

# SYNTHLabs

Symbolic reasoning dataset creator & curator.

- Generator: create your own dataset from scratch
- Converter: use existing datasets (Hugging Face support) with reasoning traces to match our SYNTH style
- DEEP Mode: multiple agents working together in various configurations
- Multi-turn Support: pass one DEEP run, let the model ask follow-up questions, and choose who should respond using SYNTH-like thinking
- Firebase/Firestore: download your data directly as a JSONL file or upload it to your Firestore (production mode)
- Data Preview: Have data but unsure what's inside? Explore it directly!
- Verifier View: evaluate generated data, remove duplicates, assign ratings

and many more!

GitHub: https://github.com/mkurman/synthlabs

Dataset:
mkurman/medical-SYNTH-reasoning-preview

reacted to prithivMLmods's post with ❤️ 4 months ago

Post

7241

Introducing Gliese-OCR-7B-Post1.0, a document content-structure retrieval VLM designed for content extraction(OCRs) and summarization. This is the third model in the Camel Doc OCR VLM series, following Camel-Doc-OCR-062825. The new version fixes formal table reconstruction issues in both En and Zh, achieving optimal performance for long-context inferences. This model also shows significant improvements in LaTeX and Markdown rendering for OCR tasks.

🤗 Gliese-OCR-7B-Post1.0 : prithivMLmods/Gliese-OCR-7B-Post1.0
📌 Gliese-Post1.0 Collection : prithivMLmods/gliese-post10-68c52c4a6ca4935f5259a6d7
⬅️ Previous Versions : prithivMLmods/Camel-Doc-OCR-062825
🧨 Gliese-OCR-7B-Post1.0 (4-bit) Notebook Demo on T4 : prithivMLmods/Gliese-OCR-7B-Post1.0
📖 GitHub [Gliese-OCR-7B-Post1.0(4-bit)-reportlab] : https://tinyurl.com/ys7zuerc

Other Collections:

➔ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
➔ Multimodal VLMs - Aug'25 : prithivMLmods/multimodal-vlms-aug25-68a56aac39fe8084f3c168bd
➔ Multimodal VLMs - July'25 : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027

.
.
.
To know more about it, visit the app page or the respective model page!!

2 replies

reacted to AdinaY's post with 🔥 5 months ago

Post

1670

Qwen team did it again!!

They just released Qwen3-Coder-30B-A3B-Instruct on the hub🔥
Qwen/Qwen3-Coder-30B-A3B-Instruct

✨ Apache 2.0
✨30B total / 3.3B active (128 experts, 8 top-k)
✨ Native 256K context, extendable to 1M via Yarn
✨ Built for Agentic Coding

posted an update 6 months ago

Post

620

🚀 Big news! NeuroBLAST, the outstanding new architecture, has officially arrived on HF! After three intense months of training my 1.9 billion SLM on my trusty RTX 3090 Ti, I’m happy to announce the results. While it’s not perfect just yet, I’ve dedicated countless hours to optimizing costs while crafting clever layer connections that mimic the brain's centers. Plus, I’ve introduced a new memory-like layer that’s sure to turn heads! I can’t wait to dive deep into this journey in my upcoming blog post. Stay tuned for the full scoop! 🔥

meditsolutions/NeuroBLAST-1.9B-Instruct-Early-Preview

reacted to Kseniase's post with 👀 10 months ago

Post

5728

8 types of RoPE

As we always use Transformers, it's helpful to understand RoPE—Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on.

Here are 8 types of RoPE that can be implemented in different cases:

1. Original RoPE -> RoFormer: Enhanced Transformer with Rotary Position Embedding (2104.09864)
Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info.

2. LongRoPE -> LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens (2402.13753)
Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search.

3. LongRoPE2 -> LongRoPE2: Near-Lossless LLM Context Window Scaling (2502.20082)
Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by “needle-driven” perplexity.

4. Multimodal RoPE (MRoPE) -> Qwen2.5-VL Technical Report (2502.13923)
Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos.

5. Directional RoPE (DRoPE) -> DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling (2503.15029)
Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage.

6. VideoRoPE -> VideoRoPE: What Makes for Good Video Rotary Position Embedding? (2502.05173)
Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing.

7. VRoPE -> VRoPE: Rotary Position Embedding for Video Large Language Models (2502.11664)
An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus.

8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10
Introduces an exponential decay factor into the rotation matrix, improving stability on long sequences.

1 reply

reacted to Kseniase's post with 🔥 10 months ago

Post

8079

15 types of attention mechanisms

Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.

Here is a list of 15 types of attention mechanisms used in AI models:

1. Soft attention (Deterministic attention) -> Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1.

2. Hard attention (Stochastic attention) -> Effective Approaches to Attention-based Neural Machine Translation (1508.04025)
Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything.

3. Self-attention -> Attention Is All You Need (1706.03762)
Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.

4. Cross-Attention (Encoder-Decoder attention) -> Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation (2104.08771)
The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources.

5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762)
Multiple attention “heads” are run in parallel. The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.

6. Multi-Head Latent Attention (MLA) -> DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (2405.04434)
Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations.

7. Memory-Based attention -> End-To-End Memory Networks (1503.08895)
Involves an external memory and uses attention to read from and write to this memory.

See other types in the comments 👇

1 reply

reacted to BrigitteTousi's post with ❤️🔥🚀 10 months ago

Post

3460

LeRobot goes to driving school! 🚗🚗🚗

Hugging Face just announced a new collab with Yaak to bring the largest open-source self-driving dataset to LeRobot!

Major kudos to HF's @cadene , as well as @sandhawalia , @Shnissen and the Yaak team!

Check out the blog post here: https://huggingface.co/blog/lerobot-goes-to-driving-school

1 reply

posted an update 10 months ago

Post

1027

I feel like it's going to take me forever

meditsolutions/medit-one-140M-9B-tokens-checkpoint

reacted to albertvillanova's post with 👍 10 months ago

Post

4195

🚀 New smolagents update: Safer Local Python Execution! 🦾🐍

With the latest release, we've added security checks to the local Python interpreter: every evaluation is now analyzed for dangerous builtins, modules, and functions. 🔒

Here's why this matters & what you need to know! 🧵👇

1️⃣ Why is local execution risky? ⚠️
AI agents that run arbitrary Python code can unintentionally (or maliciously) access system files, run unsafe commands, or exfiltrate data.

2️⃣ New Safety Layer in smolagents 🛡️
We now inspect every return value during execution:
✅ Allowed: Safe built-in types (e.g., numbers, strings, lists)
⛔ Blocked: Dangerous functions/modules (e.g., os.system, subprocess, exec, shutil)

3️⃣ Immediate Benefits 💡
- Prevent agents from accessing unsafe builtins
- Block unauthorized file or network access
- Reduce accidental security vulnerabilities

4️⃣ Security Disclaimer ⚠️
🚨 Despite these improvements, local Python execution is NEVER 100% safe. 🚨
If you need true isolation, use a remote sandboxed executor like Docker or E2B.

5️⃣ The Best Practice: Use Sandboxed Execution 🔐
For production-grade AI agents, we strongly recommend running code in a Docker or E2B sandbox to ensure complete isolation.

6️⃣ Upgrade Now & Stay Safe! 🚀
Check out the latest smolagents release and start building safer AI agents today.

🔗 https://github.com/huggingface/smolagents

What security measures do you take when running AI-generated code? Let’s discuss! 👇

#AI #smolagents #Python #Security

2 replies

posted an update 10 months ago

Post

943

Just released NVAMP Loss!

✔️ modification of the cross-entropy loss function designed specifically for training LLMs.
✔️ twist on the standard cross-entropy loss by emphasizing the importance of outlier prediction errors and dynamically normalizing token-level variance.
✔️ more stable and efficient training, leading to models that generalize better.

Check it out, give it a spin, and let me know what you think!

Licensed under the Apache 2.0 license and ready to use. Happy training! 🔥🤖

https://github.com/mkurman/nvamp-loss

posted an update 10 months ago

Post

2415

MedIT One 140M Fifth checkpoint after 9B tokens
meditsolutions/medit-one-140M-9B-tokens-checkpoint

posted an update 10 months ago

Post

440

Test-time compute (TTC) scaling’s dope. Here’s my spin: Adaptive train-time compute scaling.

https://open.substack.com/pub/mkurman/p/adaptive-train-time-compute-scaling?r=7bzqr

What’s your take? Hit me!

posted an update 10 months ago

Post

575

I have uploaded the third pre-training checkpoint after 6 billion tokens to demonstrate that the MedIT One architecture is trainable.

Give it some noise plz! Love u all :D

meditsolutions/medit-one-140M-6B-tokens-checkpoint

reacted to Jaward's post with ❤️ 11 months ago

Post

5008

made a few improvements on custom grpo trainer:
- added sequence similarity reward (seems to work)
- improved vllm support (5x inference speed)
- adjusted reward scores (this helped with format/accuracy)
- can now push to hf hub (already pushed mine lol: Jaward/smollm2_360m_grpo_gsm8k_reasoner)

Code: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb

posted an update 11 months ago

Post

3717

Introducing a new architecture, MedIT One – a single-token transformer with LSTM-like recurrence.

It is extremely fast in training and inference, but we lack funding for large-scale training. Enjoy 🍓

https://github.com/MedITSolutionsKurman/medit-one

Mariusz Kurman PRO

AI & ML interests

Recent Activity

Organizations

mkurman's activity