Scala Agent Quickstart

To appreciate the benefit of AFABL’s declarative reinforcement learning-based behavior specification syntax we present an outline of a Pac Man agent written in bare Scala without AFABL’s DSL. In AFABL worlds, every agent implements an Agent trait (AFABL does this automatically):

/**
 * An agent is an object that decides what to do given the state of the world
 * via its getAction method.
 */
trait Agent[State, Action] {

  def getAction(state: State, shouldExplore: Boolean = false): Action
}

AFABL agents implement this method using reinforcement learning algorithms. Writing an agent in Scala means writing the action selection logic in getAction by hand.

class PacMan extends Agent[PacManState, PacManAction] {

  val GHOST_PROXIMITY_THRESHOLD  = ???

  def getAction(state: PacManState) = {
    if (ghostTooClose(state))
      moveAwayFromGhost(state)
    else if (state.cherryActive)
      chaseGhost(state)
    else
      findFood(state)
  }

  def ghostTooClose(state: PacManState): Boolean = {
    distance(state.pacMan, state.ghost) < GHOST_PROXIMITY_THRESHOLD
  }

  def distance(a: Location, b: Location) = {
    sqrt(pow(a.x - b.x, 2) + pow(a.y - b.y, 2))
  }

  def moveAwayFromGhost(state: PacManState) = {
    if (state.ghost.x < state.pacMan.x)
      PacManAction.Right
    else if (state.ghost.x > state.pacMan.x)
      PacManAction.Left
    else if (state.ghost.y > state.pacMan.y)
      PacManAction.Up
    else
      PacManAction.Down
  }

  def chaseGhost(state: PacManState): PacManAction = {
    // ...
  }

  def findFood(state: PacManState): PacManAction = {
    // ...
  }
}

This agent is written naively, but it’s easy to see the complexity. How close is “close?” Should Pac Man always move in the x direction before the y direction – we haven’t bothered to write code here to decide. Is cartesian distance the right measure of proximity? All of these considerations will have to be coded explicitly in the Scala agent, and more importantly all of these low-level details of the world will have to be considered in detail, imposing a heavy cognitive burden on the programmer. In AFABL, all of these details are learned automatically from the reward functions of the modules and the agent.