Skip to content

Instantly share code, notes, and snippets.

@RafalKucharskiPK
Last active May 6, 2024 20:38
Show Gist options
  • Save RafalKucharskiPK/c3bc960f047829569808ccbcffc3b8ce to your computer and use it in GitHub Desktop.
Save RafalKucharskiPK/c3bc960f047829569808ccbcffc3b8ce to your computer and use it in GitHub Desktop.

Qualification project - RL in Ubran Mobility

This excercise is intended for candidates for PhD students in COeXISTENCE

Please send the solution reports to [email protected]


image

  1. Let's consider set of $Q$ individual travellers want to reach from their origin $O$ to their destination $D$.
  2. Everyday they choose between the two alternative routes: $a$ and $b$.
  3. The cost (travel time) is given with a naive, non-linearly increasing BPR formula:

$t_a(q_a) = t^0_a (1 + (q_a / Q_a)^2)$,

where:

  • $t_a(q_a)$ - is the travel time on arc a (or b)
  • $q_a$ - is the flow (number of vehicles using arc)
  • $t^0_a$ - is the free flow speed (with no other vehicles)
  • $Q_a$ - is the capacity (maximal number of vehices)
  1. Let's consider the following parameterization:
  • $Q$ = 1000 veh/h
  • $t^0_a$ = 5 min
  • $t^0_b$ = 15 min
  • $Q_a$ = 500 veh/h
  • $Q_b$ = 800 veh/h

Tasks:

  1. Compute (analytically) the System Optimum and User Equilibrium of such system,
  • System Optimum is the solution where total costs are minimised:

$t_a(q_a)* q_a + t_b(q_b) * q_b$, s.t. $q_a + q_b = Q$ , $q_a, q_b \geq 0$

  • User Equilibrium is the system where each traveller is individually satisfied, i.e. $t_a(q_a) = t_b(q_b)$
  1. Now let's reforumulate the above as Reinforcement Learning, i.e. each agent (traveller) every day makes a decision which path to take to maximise her reward.
  2. What is the state, environment, reward, policy, action inthis problem.
  3. Implement and solve the RL problems which find the SO or UE.
  4. Propose the problem reformulation, such that deep RL method is needed to solve it (e.g. stochastic environment, imperfect knowledge or competing agents).
  5. Comment on the problem: formulation, complexity, convergence, algorithms used, reward functions, etc.

(c) Rafał Kucharski, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment