Skip to content

Instantly share code, notes, and snippets.

@joschu
Last active February 22, 2017 01:16

Revisions

  1. joschu revised this gist May 3, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion 1-cem-v1-writeup.md
    Original file line number Diff line number Diff line change
    @@ -12,7 +12,7 @@ Note that the same exact parameters were used for all tasks.
    The important parameters are:

    - ``hid_sizes=10,5``: hidden layer sizes of MLP
    - ``extra_std=0.01``: noise added to variance, see [1]
    - ``extra_std=0.001``: noise added to variance, see [1]
    - ``batch_size=200``: number of episodes per batch
    - ``seed=0`` random seed.

  2. joschu created this gist May 3, 2016.
    32 changes: 32 additions & 0 deletions 1-cem-v1-writeup.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,32 @@
    This is a tiny update to https://gist.github.com/joschu/a21ed1259d3f8c7bdff178fb47bc6fc1#file-1-cem-v0-writeup-md

    - I ran experiments on the v1 mujoco environments
    - I reduced the added noise `extra_std` parameter from `0.01` to `0.001`

    I used the cross-entropy method (an evolutionary algorithm / derivative free optimization method) to optimize small two-layer neural networks.

    Code used to obtain these results can be found at the url
    https://github.com/joschu/modular_rl, commit ba42955b41d7f419470a95d875af1ab7e7ee66fc.
    The command line expression used for all the environments can be found in the text file below.
    Note that the same exact parameters were used for all tasks.
    The important parameters are:

    - ``hid_sizes=10,5``: hidden layer sizes of MLP
    - ``extra_std=0.01``: noise added to variance, see [1]
    - ``batch_size=200``: number of episodes per batch
    - ``seed=0`` random seed.

    The program is single-threaded and deterministic. I used ``float32`` precision, with ``THEANO_FLAGS=floatX=float32``.

    The following instructions commands will let you conveniently run all of the experiments at once.

    1. Find a computer with many cpus.
    2. If it's a headless computer, ``sudo apt-get install xvfb``. Then type ``xvfb-run /bin/bash -s "-screen 0 1400x900x24"`` to enter a shell where all your commands will benefit from a fake monitor provided by xvfb.
    2. Navigate into the ``modular-rl`` directory.
    3. ``export THEANO_FLAGS=floatX=float32; export outdir=/YOUR/PATH/HERE; export NUM_CPUS=YOUR_NUMBER_OF_CPUS``
    4. Move `2-cem-scripts.txt` into the `modular-rl` directory
    5. Run all experiments with the following command ``cat 2-cem-scripts.txt | xargs -n 1 -P $NUM_CPUS bash -c``.

    You can also set `--video=0` in these scripts to disable video recording. If video is disabled, you won't need the xvfb commands.

    [1] Szita, István, and András Lörincz. "Learning Tetris using the noisy cross-entropy method." Neural computation 18.12 (2006): 2936-2941.
    9 changes: 9 additions & 0 deletions 2-cem-scripts.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,9 @@
    "python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=Walker2d-v1 --extra_std=0.001 --seed=0 --outfile=$outdir/cem10-5-walker"
    "python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=Swimmer-v1 --extra_std=0.001 --seed=0 --outfile=$outdir/cem10-5-swimmer"
    "python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=Hopper-v1 --extra_std=0.001 --seed=0 --outfile=$outdir/cem10-5-hopper"
    "python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=Ant-v1 --extra_std=0.001 --seed=0 --outfile=$outdir/cem10-5-ant"
    "python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=InvertedPendulum-v1 --extra_std=0.001 --seed=0 --outfile=$outdir/cem10-5-ip"
    "python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=InvertedDoublePendulum-v1 --extra_std=0.001 --seed=0 --outfile=$outdir/cem10-5-idp"
    "python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=Reacher-v1 --extra_std=0.001 --seed=0 --outfile=$outdir/cem10-5-reacher"
    "python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=HalfCheetah-v1 --extra_std=0.001 --seed=0 --outfile=$outdir/cem10-5-hc"
    "python run_cem.py --n_iter=250 --batch_size=200 --agent=modular_rl.agentzoo.DeterministicAgent --hid_sizes=10,5 --env=Humanoid-v1 --extra_std=0.001 --seed=0 --outfile=$outdir/cem10-5-humanoid"