Qualification project - RL in Ubran Mobility

This excercise is intended for candidates for PhD students in COeXISTENCE

Please send the solution reports to [email protected]

Let's consider set of $Q$ individual travellers want to reach from their origin $O$ to their destination $D$.
Everyday they choose between the two alternative routes: $a$ and $b$.
The cost (travel time) is given with a naive, non-linearly increasing BPR formula:

$t_a(q_a) = t^0_a (1 + (q_a / Q_a)^2)$,

where:

Compute (analytically) the System Optimum and User Equilibrium of such system,

$t_a(q_a)* q_a + t_b(q_b) * q_b$, s.t. $q_a + q_b = Q$ , $q_a, q_b \geq 0$

User Equilibrium is the system where each traveller is individually satisfied, i.e. $t_a(q_a) = t_b(q_b)$

Now let's reforumulate the above as Reinforcement Learning, i.e. each agent (traveller) every day makes a decision which path to take to maximise her reward.
What is the state, environment, reward, policy, action inthis problem.
Implement and solve the RL problems which find the SO or UE.
Propose the problem reformulation, such that deep RL method is needed to solve it (e.g. stochastic environment, imperfect knowledge or competing agents).
Comment on the problem: formulation, complexity, convergence, algorithms used, reward functions, etc.

RafalKucharskiPK/urban_RL_games.md