Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing GPU support with Nvidia RAPIDS #128

Open
asnaylor opened this issue Jul 21, 2020 · 2 comments
Open

Testing GPU support with Nvidia RAPIDS #128

asnaylor opened this issue Jul 21, 2020 · 2 comments

Comments

@asnaylor
Copy link
Collaborator

I would like to see if fast-carpenter can support and be accelerated with GPUs, especially as GPUs are appearing more (and in larger numbers) in HEP computing clusters and supercomputers. I'm testing fast-carpenter with the Nvidia RAPIDS framework as that's a simple implementation of python GPU support which uses CUDA under the hood asnaylor@755ea35.
Using rapids=0.14 python=3.6.

I replaced import pandas with import cudf and ran pytest:

============================================================================== test session starts ==============================================================================platform linux -- Python 3.6.10, pytest-4.5.0, py-1.9.0, pluggy-0.13.1
rootdir: /home/anaylor/lzsim/fast-carpenter-fork, inifile: setup.cfg
plugins: dash-1.13.4, cov-2.7.1
collected 85 items

tests/test_event_builder.py ..                                                                                                                                            [  2%]
tests/test_expressions.py ...........                                                                                                                                     [ 15%]
tests/test_masked_tree.py ....F                                                                                                                                           [ 21%]
tests/test_tree_wrapper.py ..                                                                                                                                             [ 23%]
tests/backends/test_init.py .                                                                                                                                             [ 24%]
tests/define/test_reductions.py ........                                                                                                                                  [ 34%]
tests/define/test_systematics.py ...                                                                                                                                      [ 37%]
tests/define/test_variables.py .                                                                                                                                          [ 38%]
tests/selection/test_filters.py .F........                                                                                                                                [ 50%]
tests/selection/test_stage.py .....F.FF.                                                                                                                                  [ 62%]
tests/summary/test_binned_dataframe.py EEEEEssssssssEFEEFFF                                                                                                               [ 85%]
tests/summary/test_binning_config.py .FFF.......                                                                                                                          [ 98%]
tests/summary/test_event_level_dataframe.py .                                                                                                                             [100%]

==================================================================================== ERRORS =====================================================================================
E       AttributeError: module 'cudf' has no attribute 'IntervalIndex'
fast_carpenter/summary/binning_config.py:80: AttributeError

E           AttributeError: module 'cudf' has no attribute 'eval'
fast_carpenter/summary/binning_config.py:68: AttributeError
=================================================================================== FAILURES ====================================================================================
tests/test_masked_tree.py:88:
E                   TypeError: data must be list or dict-like
../../.pyenv/versions/miniconda3-4.3.30/envs/pytest_debug/lib/python3.6/site-packages/cudf/core/dataframe.py:228: TypeError

tests/selection/test_filters.py:43:
tests/selection/test_stage.py:64:
tests/selection/test_stage.py:127:
E       AttributeError: type object 'MultiIndex' has no attribute 'from_arrays'
fast_carpenter/selection/filters.py:107: AttributeError

tests/summary/test_binned_dataframe.py:203:
tests/summary/test_binning_config.py:19:
tests/summary/test_binning_config.py:42:
E           AttributeError: module 'cudf' has no attribute 'eval'
fast_carpenter/summary/binning_config.py:68: AttributeError

E               TypeError: memoryview: a bytes-like object is required, not 'list'
../../.pyenv/versions/miniconda3-4.3.30/envs/pytest_debug/lib/python3.6/site-packages/cudf/core/column/column.py:1524: TypeError
tests/summary/test_binned_dataframe.py:249:
E               NotImplementedError: cudf doesn't support list like data types
../../.pyenv/versions/miniconda3-4.3.30/envs/pytest_debug/lib/python3.6/site-packages/cudf/core/column/column.py:1359: NotImplementedError

tests/summary/test_binned_dataframe.py:294:
E               TypeError: unique() takes 1 positional argument but 2 were given
fast_carpenter/summary/binned_dataframe.py:107: TypeError

tests/summary/test_binned_dataframe.py:305:
tests/summary/test_binning_config.py:31:
E   AttributeError: module 'cudf' has no attribute 'Interval'
fast_carpenter/summary/binning_config.py:80: AttributeError

As demonstrated by the pytest results currently cudf does not have implementations for every pandas function as not every function is faster on GPU (so should just fall back to cpu pandas) and some implementations are still being written.

I also tried CuPy, replacing import numpy with import cupy and ran pytest:

============================================================================== test session starts ==============================================================================
platform linux -- Python 3.6.10, pytest-4.5.0, py-1.9.0, pluggy-0.13.1
rootdir: /home/anaylor/lzsim/fast-carpenter-fork, inifile: setup.cfg
plugins: dash-1.13.4, cov-2.7.1
collected 85 items

tests/test_event_builder.py ..                                                                                                                                            [  2%]
tests/test_expressions.py .F.FFFF...F                                                                                                                                     [ 15%]
tests/test_masked_tree.py FFFFF                                                                                                                                           [ 21%]
tests/test_tree_wrapper.py F.                                                                                                                                             [ 23%]
tests/backends/test_init.py .                                                                                                                                             [ 24%]
tests/define/test_reductions.py .F......                                                                                                                                  [ 34%]
tests/define/test_systematics.py F..                                                                                                                                      [ 37%]
tests/define/test_variables.py F                                                                                                                                          [ 38%]
tests/selection/test_filters.py .FFFFFFFFF                                                                                                                                [ 50%]
tests/selection/test_stage.py .....FFFFF                                                                                                                                  [ 62%]
tests/summary/test_binned_dataframe.py EEEEEEEEEEEEEEFEEFFF                                                                                                               [ 85%]
tests/summary/test_binning_config.py .FFF.......                                                                                                                          [ 98%]
tests/summary/test_event_level_dataframe.py .                                                                                                                             [100%]

==================================================================================== ERRORS =====================================================================================
E           AttributeError: module 'cupy' has no attribute 'insert'
fast_carpenter/summary/binning_config.py:77: AttributeError

E               AttributeError: module 'cupy' has no attribute 'array_equal'
fast_carpenter/expressions.py:90: AttributeError

=================================================================================== FAILURES ====================================================================================
../../.pyenv/versions/miniconda3-4.3.30/envs/pytest_debug/lib/python3.6/site-packages/cupy/sorting/count.py:24: in count_nonzero
    return _count_nonzero(a, axis=axis)
E   TypeError: Argument 'a' has incorrect type (expected <class 'cupy.core.core.ndarray'>, got <class 'awkward.array.masked.IndexedMaskedArray'>)
cupy/core/reduction.pxi:262: TypeError

tests/test_expressions.py:53:
cupy/core/_kernel.pyx:906: in cupy.core._kernel.ufunc.__call__
E   TypeError: Unsupported type <class 'numpy.ndarray'>
cupy/core/_kernel.pyx:90: TypeError

tests/test_expressions.py:63:
E   TypeError: iteration over a 0-d array
cupy/core/core.pyx:1146: TypeError

tests/test_masked_tree.py:31:
E       AssertionError
../../.pyenv/versions/miniconda3-4.3.30/envs/pytest_debug/lib/python3.6/site-packages/cupy/logic/truth.py:30: AssertionError

tests/test_masked_tree.py:56:
E           ValueError: object __array__ method not producing an array
../../.local/lib/python3.6/site-packages/awkward/array/jagged.py:592: ValueError

tests/test_tree_wrapper.py:10:
cupy/core/_kernel.pyx:906: in cupy.core._kernel.ufunc.__call__
E   TypeError: Unsupported type <class 'awkward.array.jagged.JaggedArray'>
cupy/core/_kernel.pyx:90: TypeError

Again CuPy doesn't have all the functions Numpy does (comparison table here) and sometimes the behaviour is slightly different. One of the issues is that it seems like it might not play well with awkward arrays.

Although initial testing has proven unsuccessful to provide GPU support to fast-carpenter through Nvidia RAPIDS, I think potentially with some tweaking of the code we might be able to use cuDF to accelerate parts of the code or maybe if other people know of any other libraries that run pandas or numpy on GPUs we could use that. I would suggest having a separate pip package (e.g. fast-carpenter-gpu) for GPU support like TensorFlow does, this way we could code some of the functions slightly differently to take advantage of the GPU.

It might be worth also investigating Dask-cuDF as cuDF is only supports single GPUs where as Dask-cuDF allows you to use cuDF across multiple GPUs on a single machine or multiple GPUs across many machines in a cluster.

I appreciate this is a long git issue but what are your thoughts?

@kreczko
Copy link
Contributor

kreczko commented Dec 8, 2020

With awkward 1 (#141) we will get CUDA kernels for the individual arrays.
This, combined with improved Numba compatibility should give us a lot of flexibility for the implementation:
image

Once the transition to awkward 1 is done, it would be good to revisit this issue and move the core-algorithms to GPU. Recently tensorflow has merged the GPU and CPU packages - maybe we can go the same route. One of the Swift-HEP propositions was for code to select the best implementation based on hardware available - I think this would make sense here as well. Having one package and identical interface (YAML) to both CPU and GPU versions is nice from a user perspective. To start we could add a --gpu flag to fast_carpenter.

@kreczko
Copy link
Contributor

kreczko commented Mar 28, 2022

This should now be possible with #144 - on my TODO list on place 3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants