Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

unmodeled-tyler 
posted an update 3 days ago
view post
Post
1184
Happy New Year, Hugging Face!

It's been a crazy year for me! This year I launched VANTA Research as a solo operator and managed to push out 14 original open source finetunes and 5 datasets in the span of about 4 months, completely on my own.

The reception has been much higher than I ever anticipated and sincerely appreciate everyone that's checked out my work thus far.

The good news is, I'm just getting started! In 2026 you can expect even more original models from VANTA Research, more open source datasets, and maybe some other cool things as well? 👀

2026 is gonna be big for AI in general, and I can't wait to experience it with all of you!
  • 1 reply
·
Sri-Vigneshwar-DJ 
posted an update 2 days ago
view post
Post
2712
The recent update to Meta's ad algorithm is very difficult to crack, and even the latest models struggle to keep up with it. To address this, we've created a small experimental dataset for fine-tuning models to better tackle Meta's Andromeda algorithm: Sri-Vigneshwar-DJ/hawky-ai-andromeda-dataset
mindchain 
posted an update 1 day ago
view post
Post
1455
The Architecture of 2026: Beyond the Token Trap 🚀

We are witnessing a tectonic shift in Transformer architecture. It’s no longer just about "predicting the next token"—it’s about executing latent plans on a high-speed data highway.

What happens when we combine DeepSeek’s stability with Google’s strategic intelligence?

1️⃣ The Infrastructure: DeepSeek’s mHC Moving from a single-lane residual stream to a multi-lane highway. Using the Birkhoff Polytope, mHC ensures mathematical stability (Identity Mapping) while routing specialized data through dedicated lanes.

2️⃣ The Intelligence: Google’s Meta-Controller An internal AI unit that lives inside the Transformer. It escapes the "Token Trap" by extracting data to create a latent plan, steering the model via Temporal Abstraction.

The Synergy: In a Topological Transformer, the Meta-Controller finally has the "dedicated lanes" it needs to steer complex reasoning without causing gradient explosions.

We aren't just making models bigger; we are making them architecturally smarter. 🧠

#MachineLearning #DeepSeek #GoogleAI #Transformer #AIArchitecture
sergiopaniego 
posted an update 2 days ago
view post
Post
2234
The list of hands-on notebooks (some beginner-friendly!) to get started with fine-tuning using TRL keeps growing!!

• SFT
• GRPO
• Tool calling & agents
• RL environments with OpenEnv
• LLMs and VLMs
✨ Many run on FREE Colab, making it super easy to get started fast!

https://github.com/huggingface/trl/tree/main/examples/notebooks
branikita 
posted an update 3 days ago
view post
Post
1956
We tested the maximum dynamic payload of the SO-ARM101 with our parallel gripper and a base servo replaced by a Feetech STS3250. The maximum load before failure was 630 g, at which point the Feetech STS3215 in joint 3 failed — its large brass output gear was completely worn down.

The Feetech STS3250 in the base with a metal gear train withstood a significantly higher load.
  • 2 replies
·
mike-ravkine 
posted an update 3 days ago
view post
Post
2955
Happy 2026 everyone!

I've been busy working on some new ranking/position methodologies and excited to start sharing some results.

Plot legends:

- X = truncation rate (low = good)
- ? = confusion rate (low = good)
- blue bars = average completion tokens (low = good)
- black diamonds = CI-banded performance (high = good)
- cluster squares = models inside this group are equivalent

openai/gpt-oss-120b remains the king in all dimensions of interest: truncation rates, completion lengths and performance. If I had but one complaint it's the reason_effort does not seem to actually work - more on this soon.

Second is a 3-way tie in performance between the Qwen3-235B-2507 we all know and love with an unexpected entrant - ByteDance-Seed/Seed-OSS-36B-Instruct

This is a very capable model and it's reasoning effort controls actually works, but you should absolutely not leave it on the default "unlimited" - enable a sensible limit (4k works well for 8k context length).

Third place is another 3-way tie, this one between Seed-OSS-36B (it straddles the CI boundary between 2nd and 3rd place), Qwen/Qwen3-Next-80B-A3B-Instruct (demonstrating that full attention may be overrated after all and gated is the way to go) and the newly released zai-org/GLM-4.7 which offers excellent across the board performance with some of the shortest reasoning traces I've seen so far.
  • 1 reply
·
AdinaY 
posted an update about 8 hours ago
view post
Post
372
2025.1 - DeepSeek entered the scene, backed by High Flyer Quant
2026.1 - IQuest enters the game, backed by Uniquant Quant 📈 and launching IQuest-Coder on huggingface
https://huggingface.co/collections/IQuestLab/iquest-coder

✨ 40B models: Instruct / Thinking / Loop
✨ Loop = MoE-level performance with only ~5% extra training cost
✨ Native 128K context
mitkox 
posted an update 1 day ago
view post
Post
1388
I just stress-tested the Beast: MiniMax-M2.1 on Z8 Fury G5.
2101 tokens/sec. FORTY concurrent clients. That's 609 t/s out, 1492 t/s in. The model outputs fire faster than I can type, but feeds on data like a black hole on cheat day.
But wait, there's more! Threw it into Claude Code torture testing with 60+ tools, 8 agents (7 sub-agents because apparently one wasn't enough chaos). It didn't even flinch. Extremely fast, scary good at coding. The kind of performance that makes you wonder if the model's been secretly reading Stack Overflow in its spare time lol
3 months ago, these numbers lived in my "maybe in “2030 dreams. Today it's running on my desk AND heaths my home office during the winter!
  • 2 replies
·
Reubencf 
posted an update 1 day ago
view post
Post
1444
Happy New Year 2026
i have planned to build many things this year , most of them will be cheaper or free alternative's to paid products

i am looking forward to release some useful spaces ✌️ Stay Tuned !
  • 1 reply
·
cedricbonhomme 
posted an update 2 days ago
view post
Post
2560
With VLAgentIc, you can now use your local Qwen installation via Ollama and leverage the models CIRCL/vulnerability-severity-classification-roberta-base and CIRCL/cwe-parent-vulnerability-classification-roberta-base.

The project is available here:
https://github.com/vulnerability-lookup/VLAgentIc

The VLAI Severity and CWE classifiers are available on Hugging Face:
- CIRCL/vulnerability-severity-classification-roberta-base
- CIRCL/cwe-parent-vulnerability-classification-roberta-base

The concept of AI agents—combining models, tools, and orchestration—has become fairly standardized during the last year, but VLAgentIc brings something unique:

- Agents communicate over XMPP, enabling concurrent tasks and asynchronous messaging thanks to the SPADE framework.
- Built-in presence and discovery streamline interactions between components.
- Flexible behaviours make orchestrating AI-assisted security workflows seamless for future connections
- Last but not least, the VLAI Severity and VLAI CWE classifiers are now wrapped as LLM Tools and run entirely locally.

New, more comprehensive agent tools will soon be available, leveraging the Vulnerability-Lookup API and supporting the GCVE project.

The Human-in-the-Loop agent tool will be designed to notify you and request authorization whenever a query to an external service is about to be made—ensuring that, by default, all reasoning and processing stay local on your computer.

VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification (2507.03607)