Skip to content

Taxi

Description

The Taxi-v3 environment is a classic reinforcement learning problem from OpenAI Gym that simulates a grid-based world where an autonomous taxi must efficiently pick up and drop off passengers at designated locations. Despite its simplicity, it provides an excellent testbed for exploring multi-objective reinforcement learning (MORL).

The pickup and drop-off points are randomly assigned at the start of each episode, ensuring dynamic task scenarios.

Taxi_Vid

Action Space

The action shape is (1,) in the range {0, 5} indicating which direction to move the taxi or to pickup/drop off passengers.

  • 0: Move south (down)
  • 1: Move north (up)
  • 2: Move east (right)
  • 3: Move west (left)
  • 4: Pickup passenger
  • 5: Drop off passenger

Observation Space

There are 500 discrete states since there are 25 taxi positions, 5 possible locations of the passenger (including the case when the passenger is in the taxi), and 4 destination locations.

Passenger locations:

0: Red
1: Green
2: Yellow
3: Blue
4: In taxi
Destinations:
0: Red
1: Green
2: Yellow
3: Blue
An observation is returned as an int() that encodes the corresponding state, calculated by ((taxi_row * 5 + taxi_col) * 5 + passenger_location) * 4 + destination

Reward Structure

  • -1 per step unless other reward is triggered.
  • +20 delivering passenger.
  • -10 executing “pickup” and “drop-off” actions illegally.

Our Implementation & Results

Taxi_Imp