Instantaneous goal-directed behavior and Surprise

Tracking a Moving Target by Minimizing Variational Free Energy: An Active Inference Perspective

Abstract

This article outlines the development of a computational system inspired by Active Inference, in which the two-agent system introduced
, track a dynamically moving target by minimizing the system’s variational free energy (VFE). We emphasize how unexpected target movements induce surprisal, triggering adaptive responses analogous to those seen in biological organisms. The system constructs an implicit internal world model by predicting the necessary motor commands to maintain alignment with the target, illustrating the intimate connection between action, perception, and prediction.

Introduction

Biological organisms continuously interact with uncertain environments, making sense of the world through sensory inputs and actively shaping their experience via motor actions. One influential framework to formalize this process is Active Inference, where perception and action emerge from the drive to minimize variational free energy (VFE)—a tractable proxy for surprise.
In this work, we implement an Active Inference-based system capable of tracking a moving target in a two-dimensional space. We focus on how the system handles unexpected changes in target position (surprisal), and how it implicitly constructs a world-model through learned predictions of the necessary motor commands to eye’s muscles.

Active Inference in a Motor Domain

At its core, Active Inference entails predicting incoming sensory signals based on an internal generative model and updating beliefs to minimize the mismatch (prediction error) between these predictions and actual sensations. In motor domains, this takes a reflex-like form: the agent issues actions to minimize proprioceptive prediction errors, effectively moving its sensed position toward its inferred desired state.
In our system:
Proprioception represents the feedback provided by body sensors, that enable the agents to update their belief about the motor command to eyes’ muscles, that should be issued.
Vision supplies external sensory data about the position and trajectory of a moving target.
The agents iteratively pursue the goal to minimize the mismatch between the two, updating beliefs and motor commands to minimize prediction errors. Prediction errors are weighted by precisions (more weight is given to error signals that are less noisy or uncertain).

What are proprioceptive signals?

Proprioceptive signals are the sensory signals your body generates to sense its own position, movement, and internal state. It’s often called your "sixth sense": the sense of where your body parts are in space without needing to look at them.
For example:
When you close your eyes and touch your nose with your finger, you rely on proprioception to guide your finger.
When you stand on one leg and balance, proprioceptive signals from your muscles, joints, and inner ear help you know your body's orientation and correct yourself.
Where do these signals come from?
Muscle spindles: Detect muscle stretch and length.
Golgi tendon organs: Detect tension in tendons.
Joint receptors: Sense joint position and movement.
Vestibular system: In the inner ear, helps with balance and spatial orientation.
In Active Inference models or robotics, proprioceptive signals are typically the agent's internal sensory feedback about its current motor state — like angles of joints, speed of actuators, or tension in artificial muscles — which the agent uses to infer and correct its body state relative to its predictions.

What are their relation to prior beliefs?

In Active Inference:

Prior beliefs are the agent's predictions or expectations about the world and its own body — including the expected position, movement, and configuration of its limbs or sensors.
Proprioceptive signals are the actual sensory inputs from the body about its current state (muscle length, joint position, movement, speed, etc).

The relationship:

The core idea is that the brain (or agent) is constantly comparing its prior beliefs about proprioceptive states to the actual incoming proprioceptive signals. Any mismatch between these creates a prediction error (or "surprisal").
For example:
If your brain predicts your arm should be at a 90° angle but proprioceptive signals indicate it’s at 45°, that’s a proprioceptive prediction error.
The system then acts to minimize this error — either by:
Updating beliefs (maybe you were mistaken about the intended position)
Taking motor action to move the arm to 90°, aligning sensations with the prior belief.

Formally:

In Active Inference, the generative model includes the probability P of proprioceptive outcomes, given (|) certain hidden states:
P(proprioceptive signals ∣ hidden states)
Where hidden states include the body's expected configuration based on prior beliefs. These states are called hidden, because they are based on expectations of immediate future variables. If living organisms would wait for those variables to become, functioning in a rapidly changing environment would become impossible.

Why is this important?

It allows agents (and biological organisms) to control their bodies by predicting the sensory consequences of their actions, rather than reacting purely to sensory feedback. This predictive control via prior beliefs is what makes Active Inference unique compared to classical control theory.

Scenario: Moving eyes to keep focus on a moving target

Goal: Keep your fovea (the high-acuity center of your retina) aligned with a moving target.

Key variables:

Target position in visual field (external sensory signal), from the retina tell you where the target actually is relative to your fovea: env-input.
Eye position (proprioceptive signal), from extraocular muscles tell you where your eyes actually are: proprio-input.
Prior belief about where your gaze should be (to minimize visual prediction error).
Motor Command, issued to minimize prediction errors: motor-out.

Target fig 2.png

How it works in Active Inference:

You have a prior expectation that your gaze (eye position) should track and stay aligned with the moving target — because that's usually adaptive for perception and survival.
Your brain predicts what the retinal image should look like if gaze is aligned with the target, and what proprioceptive signals from the eye muscles should be if gaze is at the target.
The brain computes mismatchs between: The predicted visual input (target centered on fovea), and the actual visual input (target off-center).
And also mismatchs between: Predicted eye position (based on your expected movement), and actual proprioceptive feedback (eyes at wrong position).
The brain can now reduce these prediction errors by: Issuing motor commands to the eye muscles to move the eyes to the new predicted position (matching the new target position), and/or updating prior beliefs about the target's trajectory, if it behaves unexpectedly.
This happens continuously in a predictive feedback loop, keeping the eyes smoothly tracking the moving target.
In summary: Your brain holds a prior belief that your gaze should stay on the target. It predicts what visual and proprioceptive signals should result. Discrepancies (prediction errors) drive eye movements to align reality with the belief.

Surprisal and Unexpected Target Movements

When the target moves unexpectedly—i.e., in a way the agent's world-model did not anticipate—this causes a spike in surprisal. In information theory terms, surprisal is the negative log-probability of observing an event given one’s internal model. In biological terms, it's the internal equivalent of "surprise," triggering attention, learning, and adaptive behavior.
In our system:
Surprisal is computed from the sensory prediction errors.
When the target shifts unpredictably, prediction errors increase, driving up surprisal and variational free energy.
This prompts rapid belief updating and reflexive actions to bring the agent's perceived position back in alignment with the target.
This parallels how animals redirect attention or replan actions when confronted with unexpected events in their environment.

target_simulation_metrics_20250521_072329.png

Constructing an Internal World Model Through Predictive Motor Commands

Rather than explicitly modeling the external target’s dynamics, the agent builds an implicit world-model through motor prediction. By continuously inferring the motor commands required to maintain alignment with the target, the system internalizes a mapping between motor space and expected sensory consequences.
In practice:
The agent predicts sensory inputs (e.g. visual position of the target) based on an “educated guess” of motor commands.
By evaluating the free energy of these outcomes, it selects actions that reduce prediction error and uncertainty.
This process effectively enacts a counterfactual inference: "What actions should I take to minimize surprise?"
Over time, this shapes a predictive control strategy — not by learning the target’s movement explicitly, but by iteratively refining the mapping between motor actions and expected sensory outcomes.

target_animated_2D_with_surprisal_20250521_074111.gif

Internalization of World Model

As the predictive system learns to anticipate sensory inputs, certain recurrent patterns of neural activity — corresponding to expected states of the environment — become energetically favorable. In predictive processing, this can be understood as regions in the system's state space where prediction error, VFE and surprise are minimized.
Over repeated exposure to certain environmental structures, the system's internal model of the world shapes neural dynamics so that activity naturally gravitates toward these low-error states. These states act as attractors, stabilizing the system's predictions and responses.
This is consistent with the idea of energy landscapes in neural networks, where attractor basins correspond to minima in a potential function related to prediction error or free energy.
, we showed how, in the context of a simple two-agent system without environmental input, the attractor corresponded to an equilibrium state characterized by one agent inferring the state of the other, in an infinite recursive loop.
In contrast, the attractor presented by the two-agent system, in the current simulation, becomes much more complex, internalizing an approximation of the target’s trajectory through space and time.

VFE_animated_2D_trajectories_IR_scatter20250521_082103.gif
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.