AFABL Modules

Below is the complete code for an AFABL implementation of a behavior module that represents the goal of finding food.

case class FindFoodState(pacMan: Location, food: Seq[Location])

val findFood = AfablModule(
  world = new PacManWorld,

  stateAbstraction = (worldState: PacManState) => {
    FindFoodState(worldState.pacMan, worldState.food)
  },

  moduleReward = (moduleState: FindFoodState) => {
    if (moduleState.food.contains(moduleState.pacMan)) 1.0
    else -0.1
  }
)

Let’s consider each element of the definition above. First is the definition of a case class, FindFoodState, to represent the state abstraction for FindFood modules. FindFoodState includes only two of the three state variables in the bunny world (yes, food is a Seq, but we dont’ have to think of the food pellets individually).

case class FindFoodState(pacMan: Location, food: Seq[Location])

We will use this state abstraction class in the stateAbstraction function below.

We store a reference to an AfablModule for FindFood in findFood.

val findFood = AfablModule(

The AfablModule factory method takes three arguments: an instance of a World that the module can act and learn in, a stateAbstraction function, and a moduleReward function.

The first argument to AfablModule is the world:

world = new PacManWorld

The world and = must be verbatim, i.e., considered part of the AFABL language.

The second argument is a state abstraction function that takes a world-state object as a parameter and returns an instance of our state abstraction class:

stateAbstraction = (worldState: PacManState) => {
  FindFoodState(worldState.pacMan, worldState.food)
}

The stateAbstraction and = must be verbatim, part of the AFABL language. worldState is a user-chosen name, PacManState must match the state type defined for the world in which the module and agent operate, in this case it is the first type parameter to World in the PacManWorld code above. The last expression in the body of the stateAbstraction function must be an instance of a module state, in this case FindFoodState.

The stateAbstraction function is optional. If you don’t supply your own state abstraction function a default is provided: the identity function, i.e., no state abstraction. However, the learning time and performance of your modules will be better if you provide a state abstraction class and function.

The third and final argument to the AfablModule factory method is a module reward function that takes an instance of our state abstraction class and returns the reward this module receives for being in that state:

moduleReward = (moduleState: FindFoodState) => {
  if (moduleState.food.contains(moduleState.pacMan)) 1.0
  else -0.1
}

The moduleReward and = are part of the AFABL language. moduleState is a user-chosen name, but the parameter type, FindFoodState in this example, must match the return type of the stateAbstraction function (or the world state, PacManState if you chose not to create a state abstraction class and function). The last expression in the body of the moduleReward function must be a Double value. In this case, which is typical, the body of the moduleReward function is an if expression which simply returns the reward based on state predicates. This example is another case where we could have implemented DSL-specific syntax, such as a list of predicates and values, but the syntactic overhead of Scala’s if expression is minimal and the code is crystal clear to any Scala programmer.

This moduleReward function gives the agent a reward of 1.0 for finding each food pellet and -0.1 for each time step in which the Pac Man does not eat. This is a technique in reward authoring: there should be a small negative reward for not moving toward a goal state, in this case finding food pellets. Also note moduleReward is not the same as a score. The job of the moduleRewared function is to guide the behavior of this behavior module of the Pac Man agent. It is a declarative specificaiton of behavior: it specifies which states are “good” and which states are “bad”. AFABL uses this infomation to derive a control policy: given the state, whcih action should the Pac Man agent execute to maximize its reward.

These three components – world, state abstraction and module reward – define a module specific learning problem on a subset of the world in which the module (and agent containing the module) may act. Each module is selfish, ignoring any other behavior modules or goals the agent may have, such as avoiding ghosts.

In addition to FindFood we would create behavior modules for each of the other goals the Pac Man agent must continuously pursue.

case class AvoidGhostState(pacMan: Location, ghost: Location)

val avoidGhost = AfablModule(
  world = new AvoidGhostWorld,

  stateAbstraction = (worldState: AvoidGhostState) => {
    AvoidGhostState(worldState.pacMan, worldState.ghost)
  },

  moduleReward = (moduleState: AvoidGhostState) => {
    if (moduleState.pacMan == moduleState.ghost) -1.0
    else 0.5
  }
)
case class FindCherriesState(pacMan: Location, cherries: Seq[Location])

val findCherries = AfablModule(
  world = new PacManWorld,

  stateAbstraction = (worldState: PacManState) => {
    FindCherriesState(worldState.pacMan, worldState.cherries)
  },

  moduleReward = (moduleState: FindCherriesState) => {
    if (moduleState.cherries.contains(moduleState.pacMan)) 1.0
    else if (state.cherryActive && (state.pacMan == state.ghost)) 10.0
    else -0.1
  }
)

The key idea here is that each behavior module is written in isolation without regard for the other goals being pursued by other modules. If the ghost is close Pac Man should move away, but the find food module will still want to pursue food becuase it selfishly doesn’t care about the ghost. Being able to write these behavior modules in isolation tremendously simplifies their code. As we discuss below, AFABL agents allow us to combine these modules into a single agent and declaratively specify how their action preferences should be considered by the agent.