Managing shared state in multi-agent workflows — what’s working for you?

I’ve been experimenting with multi-agent setups (planner → executor → reviewer, sometimes with tools), and the hardest part so far hasn’t been model choice, it’s state. Once workflows go beyond a single step, context gets split between prompts, intermediate outputs, and application code.

When retries or branching logic are involved, it becomes surprisingly hard to understand why an agent behaved a certain way, or to reproduce failures consistently. Logs help, but they don’t always tell the full story.

To explore this, I started using an orchestration-style approach where agents operate against an explicit shared spec/state instead of passing context implicitly. I’ve been testing a tool called Zenflow for this, mainly to make workflows more inspectable and predictable, but I’m still evaluating it.

I’m interested in how others here are approaching this. Are you managing state purely in code, using external stores/state machines, or relying on framework-level abstractions?

1 Like

I’ve run into the exact same problem you’re describing. Once you go beyond a single-step agent, “state” stops being a single prompt and turns into a scattered mess across system prompts, intermediate outputs, tool calls, retries, and application glue. At that point, model choice matters less than whether you can explain and reproduce what happened.
Through my Rendered Frame Theory (RFT) work I ended up formalising a practical approach I call
Multi-Positional (LMP): agents don’t pass context implicitly; they operate against an explicit shared spec/state, and every step writes a positioned, auditable update into an append-only record. The payoff is simple: predictable branching, deterministic replay, and a clear answer to “why did it do that?”. Here’s how to implement it in a way that maps cleanly onto orchestration tools (including Zenflow-style flows) without depending on any one framework.

  1. Make state a first-class contract (single source of truth)
    Create a single state.json that is treated as the truth. Prompts become views over state, not the state itself. Keep it small, explicit, and typed. A minimal shape that works well in practice: run (ids + versions), goal, constraints, facts (verified), assumptions (unverified), plan (ordered steps), tasks (status + artifacts), tool_context (cached tool outputs), decisions (what was committed and why), errors (what failed and where), metrics (optional).

  2. Use an append-only event log (JSONL) instead of “logs”
    Don’t rely on print logs to reconstruct behavior; write structured events you can replay. Create events.jsonl where each line is one event. Each agent step must emit exactly one event that includes:

  • identity: run_id, event_id, timestamp
  • position: role, step_id, branch_id
  • provenance: parent_event_id, retry_of
  • determinism: model_id, params (temperature/top_p), code_version/git_sha, prompt_template_version, tool versions
  • I/O refs: prompt_ref, tool_calls, tool_results_ref, output_ref
  • update: state_patch (the delta)
    Add an integrity hash (sha256 of the canonical event payload) so the record is tamper-evident.
  1. The “multi-positional” rule: positioned writes + lanes
    LMP works because not every agent is allowed to edit everything. Each role gets an explicit write lane, and anything outside that lane is rejected. Example lanes that eliminate drift immediately:
  • Planner can write: plan, tasks[].spec, tasks[].priority, constraints
  • Executor can write: tasks[*].result, artifacts, tool_context, metrics
  • Reviewer can write: tasks[*].verification, decisions, errors, and can set a task to “needs_rework”
    This one rule prevents reviewer silently rewriting the plan or executor rewriting the goal mid-run.
  1. Make updates patches, not blobs
    Each event should contain only the delta (state_patch), not a full re-dump of state. Use JSON Merge Patch or RFC6902 JSON Patch. The merge semantics must be deterministic: same starting snapshot + same patches = same final state. That’s the core of reproducibility.

  2. Snapshot for speed, replay for truth
    Periodically write a snapshot.json (every N events, or after each major phase). The current state is defined as: latest snapshot + replay of patches along a branch path. Snapshots make recovery fast; replay makes behavior explainable.

  3. Branching and retries become explicit data, not control-flow chaos
    Branching: create a new branch_id whose first event points back to the divergence point via parent_event_id. You now have two replayable timelines from the same origin.
    Retries: emit a new event with the same step_id and set retry_of to the failed event id, plus a reason. This makes why did we retry and what changed inspectable.

  4. Deterministic replay tool (this is what makes it real)
    Build a tiny replay.py that:

  • loads the latest snapshot
  • streams events in order for a chosen branch_id
  • validates event hashes
  • applies patches
  • outputs the reconstructed state.json plus a human trace (“planner wrote plan v3”, “executor ran tool X”, “reviewer rejected step 4”)
    If someone can replay your run from the event log and see the same decisions and state transitions, you’ve solved the reproducibility problem.
  1. Minimal file layout that works in any repo/Space
  • state_schema.json (optional but recommended)
  • state.json (current)
  • events.jsonl (append-only)
  • snapshots/ (timestamped snapshots)
  • artifacts/ (tool outputs, results)
  • replay.py (deterministic reconstruction)
  • runner.py (orchestrator glue)
  • prompts/ (versioned templates)
  1. Practical wiring into a planner → executor → reviewer flow
    Each step follows the same template: read state.json; render a prompt from state (never from “memory”); run the model/tool; write an event containing the exact inputs/outputs and a state_patch limited to the role’s lane; optionally snapshot. If a reviewer rejects, it writes a decision + error object and either opens a new branch or schedules a retry by emitting a “retry requested” event. This is the piece most multi-agent stacks miss: if state transitions aren’t the primary artifact, you’ll always be guessing later.

If you’re evaluating Zenflow specifically, this pattern should map cleanly: treat Zenflow’s graph as the execution scaffold, but treat state.json + events.jsonl + replay as the authoritative record. Framework-level abstractions are fine, but they’re not a substitute for an explicit, replayable state machine.

Let me no if it helps, liam

1 Like