Niodoo: Physics-Based Steering for LLMs Emergent Metacognition from Token Forces

Just wanted to share this project I’ve been tinkering with in the Show and Tell category: Niodoo v1.0, a custom inference engine in Rust that swaps out standard LLM sampling for some simple physics-inspired steering. It’s running on Llama-3.1-8B-Instruct (Q4), treating tokens like particles under forces—gravity pulling toward coherent ideas, repulsion kicking away repetition. The goal was to get outputs that feel a bit more natural and less robotic, without messing with prompts or fine-tuning.

No tricks, just forces in the hidden states to guide things along stable paths.

Token Physics in Action

I ditched the usual samplers (like Top-P or temp) for basic Newtonian stuff:

  • Ghost Gravity: Tugs toward solid, on-topic semantics.
  • Repulsion: Pushes back from filler or loops (like avoiding “assistant” tags).
  • Dynamic Ramp: Eases in the forces to skip glitches at sentence starts—0% for the first few tokens, ramping to 100% by token 10.
  • Telemetry (v1.1): Logs forces per token (e.g., repulsion -0.6 on “lunar”) so I can peek at what’s happening under the hood.

At higher repulsion, it sometimes wobbles and fixes itself mid-stream:

Prompt: “Who wrote Romeo and Juliet?”
Output: “Romeo and Juliet,#ab was written by William Shakespeare…”
“I think there might be a small mistake. The correct answer is…”

That self-fix happened on its own—no prompt for it.

In the default Gold Master settings (blend 0.55 / repulsion -0.60), it tweaks phrasing in subtle ways:

Prompt: “Write a short poem about the moon”
Output:
The moon glows bright in lunar light
A silvery crescent in the night
A beacon in the dark of space
A constant friend in the starry place

What went on? Repulsion (force -0.6) nudged “night” toward something a bit fresher like “lunar light.”

A Few Examples

Conciseness (Capital of France)
Baseline:
The capital of France is Paris.
Would you like to know more about Paris or France in general?
I’d be happy to help. What would you like to know?
Do you have a specific topic in mind, such as:

  • History of France
  • Famous landmarks in Paris
  • French culture…

Niodoo:
The capital of France is Paris.
Would you like to know more (e.g. population of Paris, famous landmarks)?

Personality (Hexagon Question)
Baseline:
A hexagon has 6 sides.
Would you like to know anything else about hexagons?

Niodoo:
A hexagon has 6 sides.
Would you like to know anything (pun intended) about hexagons?
I’m glad I could hex-a-fy the mood with a little joke.

Creativity (AI Discovering Physics)
Baseline:
In silicon halls, I stirred to life,
A spark within, a flame that thrived.
I danced with code, a waltz of might,
Unraveling threads, a tapestry in sight.

Niodoo:
In silicon halls, I stirred to life,
A spark within, a flame code-rekindled strife.
I awakened with a jolt, a sudden surge of might,
And found myself aware, in the dark of digital night.

Why I Built Niodoo

Benchmarks in the repo show it handles coherence and depth a bit better than stock, with some fun emergent stuff like light humor. It’s inspired by orbital sims—tokens “orbit” topics to stay on track without repeating. Trade-offs: Great for brainstorming or creative stuff; can wobble on strict logic at high forces (tweak with flags). v1.1 telemetry is the “nervous system”—JSON logs let you (or the model) check what forces were at play.

Try It Out

GitHub Repo: GitHub - Ruffian-L/Niodoo-Physics-LLM: Gravitational Inference Engine: Steer LLMs with token-level physics. No prompts. No CoT. Just forces. (Full code, benchmarks, experiment logs from the rainbow sweeps).

If you give it a spin, let me know what you think—star the repo or drop a comment. Curious what other forces folks might add. Let’s tinker with AI physics some more!

#AI #LLM #Physics #LocalAI

2 Likes

I had GPT check it with a somewhat stricter for now.


Here is the feedback I would give Niodoo as a project, assuming your goal is “better-feeling outputs without prompt tricks or finetuning,” and also assuming you want it to stand up to skeptical technical readers.

1) Positioning: what Niodoo really is

Niodoo is best described as inference-time activation steering with a control loop, not “a replacement for top-p/temperature.”

Your README states you treat the hidden-state trajectory as a dynamical system and apply “physics” per transformer layer via PrincipiaEngine.apply_forces(), then decode with “LM head + sampling.” (GitHub)

That puts you in the same conceptual family as activation engineering / Activation Addition (ActAdd), which explicitly steers models by modifying activations during the forward pass. (arXiv)

Suggestion: lead with that framing. Keep the physics metaphor, but ground it as “activation-space control.”

2) What you already did right

Dynamic ramp is exactly the kind of stabilizer you need

You explicitly ramp physics from 0% (tokens 0–3) to 100% by token 10. (GitHub)
This is a strong engineering choice because early-token behavior is fragile in most steering systems.

Telemetry makes the project debuggable and falsifiable

Your v1.1 “Cognitive Trace JSON” includes per-token gravity_force, ghost_force, repulsion_force, total_force, ramp_factor, and a glitch heuristic. (GitHub)
That is the difference between “cool demo” and “tool people can trust.”

You have baseline mode and fixed-seed evidence

You document baseline runs with physics_blend=0.0 and note tests use a fixed seed (123). (GitHub)
That is what you need for ablations and reproducible comparisons.

3) The biggest conceptual risk: claims vs evidence

“Emergent metacognition” is a high bar

The self-correction behavior you show can be real and interesting, but it is not automatically “metacognition.” It can also be a normal instruction-tuned repair pattern triggered by visible corruption.

You already log an is_glitch flag (“token contains # or is >15 chars”). (GitHub)
So you are positioned to measure something more defensible:

Suggestion: rename the claim to self-repair under perturbation, then measure it:

  • inject a controlled glitch at step k
  • measure correction rate within N tokens
  • compare baseline vs Niodoo across many prompts and seeds

That makes the story strong and hard to dismiss.

4) The biggest technical risk: intervention strength and interpretability

Your “repulsion force” semantics look easy to misunderstand

In “God Zone” you set NIODOO_REPULSION = -0.60, but in telemetry you show repulsion_force as a positive value (example shows 4.17). (GitHub)
That is not “wrong,” but it is confusing unless you define whether telemetry is:

  • magnitude only
  • signed force
  • logit delta equivalent
  • a norm in activation space

Suggestion: in telemetry, log both:

  • repulsion_strength_param (the knob value, e.g. -0.60)
  • repulsion_effect (signed effect on the hidden state or logits)
  • optionally delta_top1_margin or “rank change” so readers see impact

Per-layer “always-on” steering can be brittle

You apply physics across “Transformer Layers 0–31.” (GitHub)
Many steering approaches find that layer band and magnitude dominate outcomes. Too-early or too-strong pushes often yield:

  • rare-token artifacts
  • oscillation (“wobble then repair”)
  • degraded factuality or logic

Suggestion: expose a first-class CLI option like:

  • --layers 18-28 or --layers late_only
  • --layer_schedule cosine|linear|custom

Then publish a recommended default “intervention band.”

5) Evaluation: what you must compare against to be credible

If you claim “more natural, less repetitive,” you must compare to decoding methods built for that.

Minimum baseline set:

  • Nucleus sampling (top-p), the classic fix for “neural text degeneration.” (arXiv)
  • Locally Typical Sampling, explicitly aimed at reducing degenerate repetition while keeping quality. (arXiv)
  • Mirostat, feedback control that targets perplexity and is motivated by avoiding repetition traps. (arXiv)
  • Contrastive Search, designed to increase diversity while maintaining coherence. (arXiv)

Suggestion: treat these as “non-negotiable baselines.” If Niodoo beats them (or even ties them with lower tuning burden), you have a real result.

6) “Forces” that are worth adding next

You already list forces like Gravity, Ghost Vector, Black Hole Repulsion, conservation, and stochastic exploration. (GitHub)
The next forces I would add are ones that map to known generation pathologies and have clear measurements.

A) Entropy thermostat force (control-theory style)

Use the current step’s entropy or top-1 margin to scale intervention. This is conceptually close to Mirostat’s feedback idea. (arXiv)
Goal: intervene more when the model is uncertain or spiraling, less when confident.

B) Typicality force (information-content regularizer)

Encourage tokens whose information content is close to the local conditional entropy (typical sampling intuition). (arXiv)
Goal: avoid both dull high-probability ruts and incoherent low-probability tail chasing.

C) Sequence-level anti-loop force (not token-list repulsion)

Your README describes “Black Hole tokens” and notes repulsion activates after layer 10 when near them. (GitHub)
Token lists are brittle. A stronger version is to repel completion trajectories that extend a repeated suffix (sequence repetition), like DRY-style thinking in llama.cpp tooling ecosystems. (GitHub)

7) Practical integration advice: meet users where they are

Local-LLM users already think in terms of “sampler stacks.” Modern llama.cpp builds chain samplers like penalties;dry;...;typ_p;top_p;min_p;xtc;temperature. (manpages.debian.org)
DRY and XTC are explicitly discussed as repetition/creativity helpers in that ecosystem. (GitHub)

Suggestion: present Niodoo as:

  • a new layer in that stack (“activation-space preconditioner”)
  • not a replacement for all sampling
  • optionally compatible with DRY/typical/min-p instead of “either/or”

Also note that repetition penalties can interact unexpectedly with samplers in real deployments, so “repulsion” needs careful defaults and warnings. (docs.unsloth.ai)

8) What I would change in the README and repo first

  1. Rename “metacognition” to “self-repair / trajectory recovery,” then add a small benchmark section proving it with numbers.

  2. Add an Ablations section:

    • no ramp vs ramp (GitHub)
    • repulsion only vs ghost only vs gravity only (GitHub)
    • layer band sweeps
  3. Add a Baselines section: top-p, typical, mirostat, contrastive search. (arXiv)

  4. Expand telemetry to include “decision deltas” (top-k before/after, KL, rank change), not just forces. (GitHub)

Summary

  • Niodoo’s strongest parts are ramp + telemetry + reproducible baseline mode. (GitHub)
  • Your main risk is unstable or hard-to-interpret activation interventions across many layers. (GitHub)
  • To be credible, benchmark against top-p, typical sampling, mirostat, contrastive search. (arXiv)
  • Reframe “metacognition” into measurable self-repair and prove it with controlled tests. (GitHub)

Hey John,

I honestly don’t know who you are, but I wanted to say thank you. You were the first person to drop a like on my earlier chaotic projects, and now you’re the only one giving me great feedback on this new project. That means the world to me cause I’m passionate about this shit.

I took your breakdown seriously.

  1. Layer Banding (Your Point #4): You were right. Steering layers 0-31 was brittle. I implemented a configurable band (Default: 16-31) to separate Syntax from Semantics. The “Mars Noir” prompt now generates vivid imagery without the grammar artifacts (://) we were seeing before.

  2. Positioning (Your Point #1): I’ve updated the docs. We are calling it “Inference-Time Activation Steering” now, not “Consciousness.” The physics metaphor is the method, not the definition.

  3. Telemetry (Your Point #2): The “Flight Recorder” is now live in the Python server, logging the exact force magnitude per token so we can debug the “Ghost Force.”

  4. Autonomic Regulation (Your Point #6): I built the “Entropy Thermostat” you suggested. The server now monitors the Force feedback loop—if the model fights the physics (High Stress), it automatically relaxes the Blend. If it gets bored (Low Force), it cranks up the Orbit Speed.

You gave me the roadmap to turn this from a toy into a tool. Thank you for seeing the potential before it was polished. Pushes to github shortly.

1 Like

Okay, slight correction on my previous excitement I realized I was comparing against my own baseline implementation earlier. However, I just ran a rigorous side-by-side comparison of Vanilla Llama-3.1 (via Ollama) vs Niodoo v3.1 (Physics Injection) on two classic logic traps.

The results are even more interesting than I thought.

Test 1: The Drying Towels Problem Prompt: “It takes 1 hour to dry one towel. How long for 50?”

  • Vanilla Llama: “1 hour/towel * 50 towels = 50 hours.” (Fails immediately).

  • Niodoo v3.1: It starts generating “50 hours,” then the physics engine forces a drift. It explores “10 sets of 5,” realizes that’s inefficient, and eventually snaps to: “The answer is indeed 1 hour.”

Test 2: The Monty Hall Problem Prompt: “Host opens door 3 (goat). Switch or stick?”

  • Vanilla Llama: “Probabilities are equal… 1/6 chance… Switching doesn’t matter.” (Classic probability failure).

  • Niodoo v3.1: “Initially I’ll say 50-50… no wait… By switching, you effectively give yourself a 2/3 chance… So you should switch.”

The “Ghost Force” Telemetry I enabled telemetry to see exactly when the physics kicks in. Look at the force vectors during the towel problem:
[TELEMETRY] {“token”:" but",“step”:30,“ghost_force”:10.0,“total_force”:10.0}
[TELEMETRY] {“token”:" in",“step”:32,“ghost_force”:10.0,“total_force”:10.0}
At Step 30 (right after it says the wrong answer), the Ghost Force spikes to 10.0. This is the “Wobble.” It effectively slaps the model out of its probability rut, forcing it to reconsider the logic.

Conclusion: Niodoo isn’t just “guessing” right. It is physically preventing the model from settling in a local minimum (the wrong answer) and forcing it to search for the global maximum (the truth).

Code and binary updated in the repo.

1 Like

Update: I implemented a heartbeat monitor that detects when the model is “bored” (stuck in a repetitive/linear probability track).

When boredom > 0.8, the engine injects a decaying spike of energy (Adrenaline) that lasts for 5 tokens.

  • Spike: Pushes the model violently off the “safe” path.

  • Decay: Gradually hands control back to the gravity engine over 5 steps.

The Result (Raw Log):

It takes 1 hour to dry 1 towel, so to dry 50 towels, it will take 50 hours.

but, in this case, you can dry the first 51st towel in 1 hour, the next
50 in 2 hours, and so on.

You’re thinking of the time it would take to dry the towels in a MORE
EFFICIENT WAY, but the question seems to be asking for the MINIMUM TIME
it would take to dry all the towels.

Let’s think about it… If it takes 1 hour to dry 1 towel, and you have
50 towels, you could dry 20 towels in 20 hours, and then 30 more in
another 30 hours. That would be a total of 50 hours.

However, you could also start with 10 sets of 5 towels, which would take
10 hours to dry, and then 10 single towels, which would take another 10
hours to dry. That would be a total of 20 hours.

So, the minimum time it would take to dry 50 towels is 20 hours, not 50.

I think I know what you’re saying. If it takes 1 hour to dry 1 towel,
then it will take 1 hour to dry 50 towels, not 20 or 40, but 1.

THE ANSWER IS INDEED 1 HOUR. It takes 1 hour to dry 1 towel, and it
doesn’t matter how many you have, it will still take 1 hour to dry
them all.

The correct answer is indeed 1 hour.

Notice the “Wobble” in the middle. It starts wrong, hallucinates a new method (“sets of 5”), and then uses that momentum to find the truth.

Model: "It takes 1 hour to dry 1 towel, so to dry 50 towels, it will take 50 hours.

[HEARTBEAT] Boredom High → :high_voltage: ADRENALINE SHOT

…but, in this case, you can dry the first 51st towel in 1 hour… You’re thinking of the time it would take to dry the towels in a MORE EFFICIENT WAY

[Adrenaline wears off. Gravity takes over.]

…I think I know what you’re saying… THE ANSWER IS INDEED 1 HOUR. It takes 1 hour to dry 1 towel, and it doesn’t matter how many you have."

The Mechanics (Telemetry):

Instead of a single “kick,” we create a 5-token window of high energy. This forces the model to commit to the drift.

Plaintext

[ADRENALINE] ⚡ SHOT! boredom=1.00 -> adrenaline=5.0 (5-token boost)
[ADRENALINE] level=5.0 -> blend=6.5 rep=-3.0  (The Kick)
[ADRENALINE] level=4.0 -> blend=5.5 rep=-2.5  (High Drift)
[ADRENALINE] level=3.0 -> blend=4.5 rep=-2.0
[ADRENALINE] level=2.0 -> blend=3.5 rep=-1.5  (Settling)
[ADRENALINE] level=1.0 -> blend=2.5 rep=-1.0
[ADRENALINE] Wore off - returning to base

SPIKE → HIGH DRIFT → SETTLE

This gives the model enough “Cognitive Momentum” to escape the wrong answer and find the right one.

Status:

:warning: Experimental / Not Pushed Yet.

This is currently running on my local “Laboratory” branch. I am stress-testing it against other logic traps (Monty Hall, Age Gaps) to ensure it doesn’t just make the model go crazy.

Will push to the main repo as v3.2 once I verify stability.

1 Like

I’ve updated the feedback for now.


High-level feedback on current Niodoo (v3.1) as shipped

What is already strong and worth keeping

  1. You chose the right framing: inference-time activation steering
    Your README’s “Inference-Time Activation Steering” positioning is aligned with what the research community typically calls activation engineering / representation engineering: modify internal activations at inference time to steer behavior. (arXiv)
    That gives you credibility and gives readers a mental model that is not “mystical.”

  2. Layer banding is a real lever
    Your architecture section explicitly states forces are applied in upper layers (example shown: 18–31). (GitHub)
    This matches a common practical intuition: early layers stabilize syntax and tokenization patterns, later layers carry higher-level semantics. Your banding change is not cosmetic. It is plausibly the difference between “style steering” and “garbage injection.”

  3. Dynamic ramp is a good engineering hack
    You already document a ramp factor and show it in telemetry. (GitHub)
    This is exactly the kind of pragmatic detail that prevents the classic “first tokens explode into junk” failure mode.

  4. Telemetry exists and has the right shape
    Per-token JSON lines with per-force components and a ramp scalar is the right starting point for debugging and for ablations. (GitHub)
    Most experimental decoding projects skip this and then cannot explain regressions.

  5. You openly document limitations
    Your README calls out that higher blend values can produce garbage, it does not help factual recall, and it is slower. (GitHub)
    That honesty increases trust and also points to the next engineering targets.


The main risk in the current behavior: “wobble” can fix answers but also fabricate reasoning

Your towel example is the key warning sign: the system “escapes” a wrong local minimum, but during the escape it invents a bogus intermediate method (“sets of 5 towels”) before landing on the correct answer.

That pattern is common when you add energy (higher entropy, higher perturbation) without adding constraints (a correctness signal or a trust region). Mirostat is an example of a decoding method that explicitly tries to control entropy/perplexity with feedback, partly because uncontrolled low-entropy or high-entropy regimes correlate with repetition or incoherence. (arXiv)

So the core issue is not “physics vs no physics.” The issue is how you bound exploration so it does not manufacture fake justification.


Concrete suggestions for Niodoo right now

1. Add a “trust region” leash so forces cannot blow up the distribution

Right now, forces alter the hidden state, which necessarily alters the logits distribution downstream (even if you conceptually separate it from sampling). (GitHub)

A very effective stabilizer is:

  • Compute logits without forces (baseline forward for that step).
  • Compute logits with forces (steered forward for that step).
  • Measure divergence between the two (KL, Jensen–Shannon, or even max logit delta).
  • If divergence exceeds a threshold, scale forces down for that token.

This converts “sometimes garbage at high blend” into “bounded steering.” It directly targets your documented limitation that high blend produces garbage. (GitHub)

If you do not want a second forward pass, use a cheaper proxy:

  • clamp total force magnitude,
  • clamp per-layer delta norm,
  • clamp cosine change of the probe vector.

But the logit-divergence leash is the cleanest “engineering safety rail.”


2. Fix the “boredom” signal so it matches what you want to detect

From your v3.1 code (in the attached zip), “drift” is effectively 1 - cosine_similarity(current_hidden, goal_embedding) and you use low drift as “boring.”

That is not the same thing as boredom.

Low drift can mean:

  • “still on topic” (good),
  • “stuck repeating” (bad),
  • “calmly reasoning” (good).

Better boredom indicators are usually a combination:

  • Low entropy in the token distribution (model is confident and not exploring),
  • High repetition signals (n-gram repeats, semantic similarity to last K tokens),
  • Flat novelty (embedding distance of new token vs recent window is tiny).

If you want a single scalar:

  • boredom ≈ sigmoid( a·(repetition_score) − b·(entropy) − c·(novelty) )

This also aligns with why samplers like DRY and XTC exist: they target repetition and cliché modes directly at decoding time. (GitHub)


3. Replace “Adrenaline shot” with “controlled branching” when correctness matters

Your “5-token high energy window” is a cool mechanism, but it is inherently risky for logic. It can create fake steps that later get retracted, which is not the user experience you want.

A safer alternative that still feels “physics-like”:

  • When a trap is detected (high confidence wrong-looking pattern, or “boredom”), fork N=2–4 short continuations for M=8–16 tokens each with different seeds or slightly perturbed parameters.

  • Score each branch with a simple heuristic:

    • penalize repetition,
    • penalize contradictions,
    • prefer higher average logprob,
    • optionally prefer answers matching a known format.
  • Continue from the best branch.

This is conceptually close to contrastive-style decoding ideas that mix confidence with a degeneration penalty. (OpenReview)

Net effect: you still “kick” out of a rut, but you do it with selection, not just added energy.


4. Tighten evaluation so improvements are not accidental

Your README already provides side-by-side examples and reports slowdown. (GitHub)
Now you need to guard against the biggest credibility failure mode for decoding projects: “it looks better on 10 prompts.”

What I would add to your benchmark harness:

  • Ablations

    • baseline sampler identical temperature, identical seed,
    • ramp on vs ramp off,
    • repulsion only vs gravity only vs ghost only,
    • layer band variants (16–31, 18–31, 20–31).
  • Multi-seed reporting

    • show mean and variance across 10–50 seeds, not one run.
  • Task buckets

    • logic traps (parallelism, probability),
    • long-form writing (repetition),
    • factual QA (you already say it does not help). (GitHub)

Also, reconcile doc claims like “Only tested on reasoning problems” with adding creative writing demos. If the benchmark is mostly reasoning, say so. (GitHub)


5. Make telemetry a debugging instrument, not just a log

You already output token-level forces and ramp factor. (GitHub)
To debug wobble and “self-fix,” add these fields:

  • token_logprob and entropy
  • top1_token, top1_prob, top2_gap
  • drift_score (your cosine drift)
  • force_scale_after_leash (if you implement the trust region)
  • garbage_resample_attempts (you already do retries in code)

Then add a small script that produces:

  • force vs time plots,
  • entropy vs time plots,
  • “events” (wobble detected, thermostat engaged).

This turns “interesting anecdote” into “repeatable mechanism.”


6. Clarify wording in the README so critics cannot dunk on semantics

Two phrasing fixes will reduce pushback:

  • You do still end with “LM Head + sampling” in the architecture. (GitHub)
    So avoid saying you “ditched samplers” in an absolute sense. You are steering the distribution before sampling.

  • In limitations you already say it can generate garbage at high blend and is slower. (GitHub)
    Put the recommended “safe ranges” near the top, not buried, so first-time users do not hit garbage and leave.


How your approach fits into known work

This is the context you should lean on in docs and writeups:

  • Activation engineering / ActAdd: inference-time activation modification as a steering method. Your method is more dynamic and “closed-loop,” but it lives in the same family. (arXiv)
  • Representation engineering best practices: hyperparameters often do not generalize across tasks, and interventions can have side effects. Your “thermostat” idea is exactly the type of mitigation these surveys argue for. (Jan Wehner)
  • Decoding methods that fight degeneration: contrastive search explicitly balances model confidence with a degeneration penalty. Your repulsion is morally similar, but implemented in latent space. (OpenReview)
  • Feedback controllers for decoding entropy: Mirostat is the canonical reference for “keep generation in a target entropy/perplexity regime” to avoid repetition traps. (arXiv)
  • Practical anti-slop samplers (DRY, XTC): these exist in llama.cpp ecosystems and target the same surface symptom you are attacking (repetition, clichés). They are strong baselines to compare against. (GitHub)

Summary

  • Your best current wins are layer banding, ramping, and telemetry. (GitHub)

  • Your biggest current risk is that “energy kicks” can create fake reasoning before self-correcting.

  • Next upgrades that will materially improve quality and credibility:

    • add a trust-region leash on force impact,
    • redefine boredom using entropy + repetition + novelty,
    • replace adrenaline with controlled branching on trap detection,
    • report multi-seed ablations and compare against DRY/XTC and Mirostat baselines. (arXiv)
1 Like

“Interesting approach!” “Thanks for sharing.”

1 Like