AFABL Agent Quickstart

An AFABL agent acts in a particular world, is composed of independent behavior modules pursuing their own continuing goals, and has an agent level reward function to learn when it should listen to each module. Here’s a definition of the state for a simple Pac Man world that doesn’t include cherries:

case class Location(x: Int, y: Int)

case class PacManState(
  pacMan: Location,
  ghost: Location,
  food: Seq[Location],
)

Here’s a definition of a set of Pac Man actions:

object PacManAction extends Enumeration {
  type PacManAction = Value
  val Up, Down, Left, Right = Value
}

And here is a complete AFABL Pac Man Agent with two behavior modules. Read the comments for explanations and tips:

// Each module can have a state abstraction (optional, but recmmended)
case class FindFoodState(pacMan: Location, food: Seq[Location])

val findFood = AfablModule(
  world = new PacManWorld,

  // If you defined a state abstraction class for this module, the
  // stateAbstraction function converts a world state into a module state.
  // If you did not define a state abstraction, simply have this function
  // return the world state.
  stateAbstraction = (worldState: PacManState) => {
    FindFoodState(worldState.pacMan, worldState.food)
  },

  // The moduleReward function returns the reward for getting into
  // a particular state. Typically an if-ladder with state predicates.
  // There should be one or more states with a small positive reward --
  // the goal(s) of this module -- and an else clause with a very small
  // negative reward. The very small negative reward for non-goal states
  // is a reinforcement learning authoring technique that makes reinforcement
  // learning algorithms work better.
  moduleReward = (moduleState: FindFoodState) => {
    if (moduleState.food.contains(moduleState.pacMan)) 1.0
    else -0.1
  }
)

case class AvoidGhostState(pacMan: Location, ghost: Location)

val avoidGhost = AfablModule(
  world = new AvoidGhostWorld,

  stateAbstraction = (worldState: AvoidGhostState) => {
    AvoidGhostState(worldState.pacMan, worldState.ghost)
  },

  // This example shows a module whose "goal" is to avoid a particular
  // state. The rewards are the negatives of what you'd use for a
  // "seeking" goal like findFood above.
  moduleReward = (moduleState: AvoidGhostState) => {
    if (moduleState.pacMan == moduleState.ghost) -1.0
    else 0.1
  }
)

val pacMan = AfablAgent(
  // Same world as the modules
  world = new PacManWorld,

  // All the behavior modules that make up this agent
  modules = Seq(findFood, avoidGhost),

  // Agent level reward is different from the module rewards. Think of this
  // as a game score. There should be a world state predicate for each goal
  // or avoidance state in the modules. States from the goals of "seeking"
  // modules should get a small positive reward. Avoidance states from
  // avoidance modules should get a reward of 0.The else clause should have
  // some very small positive reward that means "didn't meet a goal,
  // but didn't die." This example is simple, but in general you can vary
  // these rewards to prioritize different modules when there's more than one
  // "seeking" module.
  agentLevelReward = (state: PacManState) => {
    if (state.pacMan == state.ghost) 0.0
    else if (state.food.contains(state.pacMan)) 1.0
    else 0.1
  }
)

So the general AFABL agent authoring process is:

  • Create behavior modules with state abstractions and module reward functions, where each module is coded “seflishly,” i.e., without regard for any other behavior modules and their goals.
  • Create AFABL agent with a Seq of modules created above and an agent level reward function.

The key idea is that you’re authoring behavior declaratively. Instead of writing detailed action selection logic, you’re simply saying which states are good and which states are bad.

Next we’ll discuss a simple Pac Man agent written using bare Scala and then discuss the elements of AFABL agents in greater detail with a slightly more complex Pac Man world that includes cherries.