[-------------- normalize @ torchvision==0.15.0a0+b1f6c9e ---------------]
| v1 | v2
1 threads: ---------------------------------------------------------------
(3, 400, 400) / float32 / cpu | 128 (+- 1) | 93 (+- 2)
(3, 400, 400) / float32 / cuda | 91 (+- 0) | 54 (+- 0)
(16, 3, 400, 400) / float32 / cpu | 3528 (+- 26) | 2507 (+- 9)
(16, 3, 400, 400) / float32 / cuda | 764 (+- 2) | 501 (+- 1)
6 threads: ---------------------------------------------------------------
(3, 400, 400) / float32 / cpu | 54 (+- 0) | 36 (+- 0)
(16, 3, 400, 400) / float32 / cpu | 381 (+- 3) | 289 (+- 3)
Times are in microseconds (us).
Aggregated performance change of v2 vs. v1: -30.1% (improvement)