Comparison of modular_rl TRPO on Fully-supervised and Semi-supervised pendulum tasks. Algorithm-id: alg_zeXD1NReRguwX4NYBg8Tw Link to algorithm page: https://gym.openai.com/algorithms/alg_zeXD1NReRguwX4NYBg8Tw Best 100-episode average reward on Pendulum-v0 -133.06 ± 10.14 -137.62 ± 9.41 -143.28 ± 10.81 -207.62 ± 15.74 -135.87 ± 8.68