Continual, interactive, causal agents

Modern language-model agents are usually built by stacking separate training regimes: pretraining, mid-training, supervised fine-tuning, preference modeling, rejection sampling, reinforcement learning, reasoning-specific tuning, self-distillation, and deployment-time patches. This is intelligence by design and engineering, as opposed to emergent intelligence. The multi-stage recipe is a research local minimum, and while it has produced powerful systems, it has no single semantics for an interaction transcript: User messages, tool outputs, demonstrations, model actions, verifier judgements, and corrections are often treated as if they were the same kind of evidence. This paper studies a simpler alternative: a continual, causal interaction stream. The central rule is that world-written tokens are evidence, whereas self-written agent tokens are interventions. In LLM fine-tuning this rule becomes a loss mask: keep the agent’s own attempts in the context, but remove them from the supervised target. In a small, reproducible STEM reasoning experiment, this interventional stream agent reaches held-out solve accuracy, which is comparable to that of ReST, GRPO, and SFT with oracle corrections. The result is not a claim of benchmark dominance. It is a proof of viability for a single continual learning agent that can use interaction, causality, feedback, and corrections to achieve purposeful and useful behaviour.