CUDA performance of training code, Chapter 8, much lower than expected #32

Transigent · 2019-02-17T05:31:35Z

Hello

Firstly thanks for the great book, I have learned a great deal from it. This is my first foray into the world of machine learning and I am having a ball.

I am attempting to get the stock trading sample running from Chapter 8. Initially it seems to run but I am finding that the performance with CUDA enabled is much worse than I had expected.

My system specs are as follows:
X6 Phenom II 1055t
GTX 980 4GB watercooled
24GB RAM
Windows 10 64

Software:
Anaconda 3
Python 3.6.7
PyCharm 2018.3

An initial pass through train_model.py without CUDA enabled yields the following performance:

Note the GPU is idle as expected, at about 1% utilization.

Task Manager shows CPU at 100% as expected, GPU at 0%

After enabling CUDA with --cuda switch, performance is only marginally better. Note that the GPU is at about 2% load according to Task Manager, or 9% load according to GPU temp, and temperature has risen a whole 1-2 degrees C on average (if I run FurMark the GPU is at 99% and the temperature quickly rises from 33 to maybe 48 Deg C, on water cooling). It's fast up until the buffer is populated and training starts, then afterwards it takes about 15 seconds to spit out one line, or 100 epochs.

GPU is barely doing anything,

Task Manager says GPU is 2% utilized. CPU has a few peaks but averages about 30%

It appears that the mode is changing from CPU to GPU since CPU goes down and GPU goes up. But it seems that the GPU only improved the outcome by maybe 60%, which seems almost negligible for a GPU like a GTX 980, and has 2048 Shader, 128 Texture, 64 ROPs. And clearly at 2% utilization the GPU is not doing a great deal of acceleration.

I assume I must have done something wrong. I have been trying different variations of packages and settings for a few days, including:

Trying different Python versions, 3.6, 3.7 (some minor versions would not play with PyTorch at all, eg. 3.6.8)
Installing the CUDA 9 Windows installer from NVidia as a system wide installation
Trying different video drivers, the latest two, and also one that was installed with the CUDA installer which was dated 2017
Trying a bunch of different package versions of various things, cudatoolkit, PyTorch etc.
Also I did some tests to ensure that CUDA was working

import torch
torch.cuda.current_device()
Out[3]: 0
torch.cuda.device(0)
Out[4]: <torch.cuda.device at 0x153b39c5780>
torch.cuda.device_count()
Out[5]: 1
torch.cuda.get_device_name(0)
Out[6]: 'GeForce GTX 980'

My current package configuration is as follows;
` Name Version Build Channel

anaconda-client 1.7.2 py36_0
anaconda-navigator 1.9.6 py36_0
anaconda-project 0.8.2 py36_0
cuda90 1.0 0 pytorch
cudatoolkit 9.0 1
cudnn 7.3.1 cuda9.0_0
gym 0.11.0 pypi_0 pypi
matplotlib 3.0.2 py36hc8f65d3_0
numpy 1.15.4 py36h19fb1c0_0
opencv-python 4.0.0.21 pypi_0 pypi
pip 19.0.1 py36_0
ptan 0.3 pypi_0 pypi
python 3.6.7 h9f7ef89_2
pytorch 0.4.1 py36_cuda90_cudnn7he774522_1 pytorch
scipy 1.2.0 py36h29ff71c_0
tensorboard 1.12.2 py36h33f27b4_0
tensorboardx 1.6 pypi_0 pypi
tensorflow 1.12.0 gpu_py36ha5f9131_0
tensorflow-base 1.12.0 gpu_py36h6e53903_0
tensorflow-gpu 1.12.0 pypi_0 pypi
torchvision 0.2.1 py_2 pytorch`

Hmm that borked my formatting, heres an image

Any thoughts about what I might have done wrong here would be much appreciated. I'm still just getting a handle on Python and Deep RL.

Thanks for your time!

Chris

The text was updated successfully, but these errors were encountered:

icompute386 · 2019-03-08T23:38:28Z

Hi,
I'm seeing the same issue, running on an RTX 2080 Ti. I notice the video memory usage goes up with CUDA, enabled, though marginally, and that on the most part the GPU is idle.

CPU usage goes up too, though not to 100%, which occurs when not enabling CUDA.

lamhk · 2019-04-18T03:30:05Z

Hi I just want to provide my observation after running through the training for Chapter08. My platform is running on 1) Ubuntu 16.04; 2) i5-8400 with GTX-1060-6G with 16G RAM. Below are running train_model_conv.py with around 150 fps and the GPU utilization is around 85-88% (image attached).

1100271: done 77900 games, mean reward -0.202, mean steps 39.56, speed 91.32 f/s, eps 0.10
1106617: done 78000 games, mean reward -0.202, mean steps 39.97, speed 149.52 f/s, eps 0.10
1112197: done 78100 games, mean reward -0.202, mean steps 40.32, speed 149.28 f/s, eps 0.10
1118498: done 78200 games, mean reward -0.202, mean steps 40.76, speed 149.43 f/s, eps 0.10
1123793: done 78300 games, mean reward -0.202, mean steps 41.08, speed 149.51 f/s, eps 0.10
1129957: done 78400 games, mean reward -0.203, mean steps 41.47, speed 149.41 f/s, eps 0.10
1136721: done 78500 games, mean reward -0.203, mean steps 41.94, speed 149.29 f/s, eps 0.10
1142939: done 78600 games, mean reward -0.203, mean steps 42.34, speed 149.45 f/s, eps 0.10
1148525: done 78700 games, mean reward -0.203, mean steps 42.71, speed 149.35 f/s, eps 0.10
1154480: done 78800 games, mean reward -0.203, mean steps 43.07, speed 149.40 f/s, eps 0.10

Transigent closed this as completed Feb 17, 2019

Transigent reopened this Feb 17, 2019

MurtazaTinwala assigned Shmuma Feb 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA performance of training code, Chapter 8, much lower than expected #32

CUDA performance of training code, Chapter 8, much lower than expected #32

Transigent commented Feb 17, 2019 •

edited

Loading

icompute386 commented Mar 8, 2019

lamhk commented Apr 18, 2019

CUDA performance of training code, Chapter 8, much lower than expected #32

CUDA performance of training code, Chapter 8, much lower than expected #32

Comments

Transigent commented Feb 17, 2019 • edited Loading

icompute386 commented Mar 8, 2019

lamhk commented Apr 18, 2019

Hi I just want to provide my observation after running through the training for Chapter08. My platform is running on 1) Ubuntu 16.04; 2) i5-8400 with GTX-1060-6G with 16G RAM. Below are running train_model_conv.py with around 150 fps and the GPU utilization is around 85-88% (image attached).

Transigent commented Feb 17, 2019 •

edited

Loading