bordax.agents
bordax.agents.base
Agent base classes and simple policy/value MLP implementations.
This module defines the Agent abstract base class and several concrete agents and neural modules used by the project:
- Agent: abstract interface for agents (init, policy, action, value).
- BlankAgent: a simple uniform (random) discrete action agent.
- MLP / MLP_dtsemnet / MLP_boolean: small neural modules used as policy architectures.
- MLPPolicyValue / MLPPolicyValueContinuous: actor-critic wrappers that expose a policy (Categorical or Normal) and a value function.
Docstrings are provided for classes and public methods to aid reading and automatic documentation generation.
Agent
Bases: ABC
Abstract base class for all agents.
Subclasses must implement init and policy. The action
method is provided as a JIT-compiled convenience wrapper around
policy. Override value if the agent supports a value function
(required for actor-critic algorithms such as PPO).
Source code in bordax/agents/base.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | |
action(params, obs, key, is_deterministic=False)
Sample or select an action from the policy distribution.
Source code in bordax/agents/base.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | |
init(key, sample_obs)
abstractmethod
Initialise network parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
PRNGKey
|
JAX random key for weight initialisation. |
required |
sample_obs
|
Any
|
A sample observation with the correct shape
(including the |
required |
Returns:
| Type | Description |
|---|---|
AgentParameters
|
An |
AgentParameters
|
or |
Source code in bordax/agents/base.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 | |
policy(params, obs, key)
abstractmethod
Compute the policy distribution for a batch of observations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
params
|
AgentParameters
|
Current network parameters. |
required |
obs
|
Any
|
Batch of observations, shape |
required |
key
|
PRNGKey
|
JAX random key (for stochastic policy heads). |
required |
Returns:
| Type | Description |
|---|---|
DistributionLike
|
Tuple of |
Mapping[str, Any]
|
Distrax distribution and |
Source code in bordax/agents/base.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | |
value(params, obs)
Compute the value estimate for a batch of observations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
params
|
Params
|
Current network parameters. |
required |
obs
|
Any
|
Batch of observations, shape |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Value estimates, shape |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If the agent has no value function. |
Source code in bordax/agents/base.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 | |
BlankAgent
Bases: Agent
A trivial agent that returns a uniform categorical policy.
Source code in bordax/agents/base.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | |
policy(params, obs, key)
Return a uniform categorical distribution over actions.
Source code in bordax/agents/base.py
120 121 122 123 | |
DQNAgent
Bases: Agent
A DQN agent with a Q-network and target network.
Source code in bordax/agents/base.py
215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | |
MLPPolicyValue
Bases: Agent
Actor-critic wrapper for discrete actions.
Source code in bordax/agents/base.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 | |
MLPPolicyValueContinuous
Bases: Agent
Actor-critic wrapper for continuous actions.
Source code in bordax/agents/base.py
174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 | |
bordax.agents.components
MLP
Bases: Module
Source code in bordax/agents/components.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | |
layer_sizes
instance-attribute
Simple fully-connected MLP used for policy/value heads.
The module constructs a sequence of Dense layers using layer_sizes.
The final layer is returned without an activation.
MLP_boolean
Bases: Module
Source code in bordax/agents/components.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
action_dim
instance-attribute
Boolean-function-inspired dense module.
The module constructs a mapping from inputs to outputs by interpreting
the learned dense layer as coefficients over the truth table of all
boolean functions with n inputs. The outputs are reduced per
action_dim using a max operation.
MLP_dtsemnet
Bases: Module
Source code in bordax/agents/components.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | |
action_dim
instance-attribute
A decision-tree-inspired dense module.
This module builds an internal representation derived from a binary
tree structure of depth tree_depth and maps inputs to action_dim
outputs. It is an experimental architecture used as an alternative
policy head.
__call__(x)
Compute the forward pass for the tree-based representation.
The implementation supports both single-example inputs (1D) and batched inputs (2D). Returns an array shaped (batch, action_dim).
Source code in bordax/agents/components.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | |
bordax.agents.utils
make_agent(agent_name, env, agent_config={})
Create an agent by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
agent_name
|
str
|
Identifier in the form
|
required |
env
|
EnvAdapter
|
Environment adapter used to infer observation and action spaces. |
required |
agent_config
|
dict
|
Dict of hyperparameters passed to the agent constructor.
Required keys depend on the agent type (e.g. |
{}
|
Returns:
| Type | Description |
|---|---|
Agent
|
An initialised |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in bordax/agents/utils.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | |