This excercise is intended for candidates for PhD students in COeXISTENCE
Please send the solution reports to [email protected]
- Let's consider set of
$Q$ individual travellers want to reach from their origin$O$ to their destination$D$ . - Everyday they choose between the two alternative routes:
$a$ and$b$ . - The cost (travel time) is given with a naive, non-linearly increasing
BPR
formula:
where:
-
$t_a(q_a)$ - is the travel time on arc a (or b) -
$q_a$ - is the flow (number of vehicles using arc) -
$t^0_a$ - is the free flow speed (with no other vehicles) -
$Q_a$ - is the capacity (maximal number of vehices)
- Let's consider the following parameterization:
-
$Q$ = 1000 veh/h -
$t^0_a$ = 5 min -
$t^0_b$ = 15 min -
$Q_a$ = 500 veh/h -
$Q_b$ = 800 veh/h
- Compute (analytically) the
System Optimum
andUser Equilibrium
of such system,
- System Optimum is the solution where total costs are minimised:
- User Equilibrium is the system where each traveller is individually satisfied, i.e.
$t_a(q_a) = t_b(q_b)$
- Now let's reforumulate the above as
Reinforcement Learning
, i.e. each agent (traveller) every day makes a decision which path to take to maximise her reward. - What is the state, environment, reward, policy, action inthis problem.
- Implement and solve the RL problems which find the SO or UE.
- Propose the problem reformulation, such that deep RL method is needed to solve it (e.g. stochastic environment, imperfect knowledge or competing agents).
- Comment on the problem: formulation, complexity, convergence, algorithms used, reward functions, etc.
(c) Rafał Kucharski, 2023