Skip to content

Instantly share code, notes, and snippets.

@AHEADer
Last active February 5, 2020 07:09
benchmark

Benchmark

This is a speed benchmark for distributed training.

Enviroment

System configuration

  • Ubuntu xxx
  • CUDA xxx
  • NCCL xxx

Framework

  • Autobot xxx
  • Tensorflow xxx
  • Pytorch xxx
  • MXNet xxx

Profiling tools

  • cProfile
  • NVIDIA Nsight Systems
  • Profile tools provided by each framework

Testing models and experiments

Models

*Image Classification: ResNet50 VGG16 *Translation: GNMT-16 *Video Captioning: S2VT

Experiments

Each experient below should be tested among four deep learning frameworks.

  1. Different GPU placement (e.g. 4 GPUs in different nodes)
  2. Horovod or not, our Horvod vs offical Horovod
  3. RDMA or socket
  4. Different parallel architecture

Benchmark results

TODO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment