Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use variable with dtype('uint64') #132

Open
asnaylor opened this issue Aug 18, 2020 · 7 comments
Open

Unable to use variable with dtype('uint64') #132

asnaylor opened this issue Aug 18, 2020 · 7 comments

Comments

@asnaylor
Copy link
Collaborator

I am trying to bin a root variable which is a ULong_t but i get a TypeError:

TypeError: Iterator operand 1 dtype could not be cast from dtype('uint64') to dtype('int64') according to the rule 'safe'

This is my config file:

stages:
    - define_vars: fast_carpenter.Define
    - output: fast_carpenter.BinnedDataframe 
    
define_vars:
    variables:
        - evtID: eventHeader.eventID

output:
    binning:
        - {in: evtID}

versions:

fast-carpenter==0.18.2
numpy==1.19.1
coffea==0.6.42

Full Trackeback:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/bin/fast_carpenter", line 8, in <module>
    sys.exit(main())
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/fast_carpenter/__main__.py", line 64, in main
    results, _ = backend.execute(sequence, datasets, args)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/fast_carpenter/backends/coffea.py", line 100, in execute
    out = run_uproot_job(coffea_datasets, 'events', fp, executor, executor_args=exe_args)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/coffea/processor/executor.py", line 1068, in run_uproot_job
    executor(chunks, closure, wrapped_out, **exe_args)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/coffea/processor/executor.py", line 567, in futures_executor
    _futures_handler(futures, accumulator, status, unit, desc, add_fn, tailtimeout)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/coffea/processor/executor.py", line 197, in _futures_handler
    add_fn(output, finished.pop().result())
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
TypeError: Iterator operand 1 dtype could not be cast from dtype('uint64') to dtype('int64') according to the rule 'safe'

I have also had this issue with alphatwirl.

@benkrikler
Copy link
Member

The traceback you've shared is the secondary one caused by the underlying issue - I think there will be another traceback above that one, could you share this as well?

@asnaylor
Copy link
Collaborator Author

Ah right, yes i forgot to copy all the traceback:

$ fast_carpenter dataset_cfgs/salt_test.yml processing_cfgs/salt_cfg.yml --mode coffea:local --ncores 4
Preprocessing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.50s/file]
Processing:   0%|                                                                                                                                       | 0/1 [00:01<?, ?chunk/s]
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/coffea/processor/executor.py", line 139, in __call__
    out = self.function(*args, **kwargs)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/coffea/processor/executor.py", line 833, in _work_function
    raise e
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/coffea/processor/executor.py", line 794, in _work_function
    out = processor_instance.process(df)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/fast_carpenter/backends/coffea.py", line 64, in process
    work.event(chunk)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/fast_carpenter/define/variables.py", line 72, in event
    result = full_evaluate(chunk.tree, expression, fill_missing,
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/fast_carpenter/define/variables.py", line 143, in full_evaluate
    result = evaluate(tree, expression)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/fast_carpenter/expressions.py", line 150, in evaluate
    result = numexpr.evaluate(cleaned_expression, local_dict=adaptor)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/numexpr/necompiler.py", line 834, in evaluate
    return compiled_ex(*arguments, **kwargs)
TypeError: Iterator operand 1 dtype could not be cast from dtype('uint64') to dtype('int64') according to the rule 'safe'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/bin/fast_carpenter", line 8, in <module>
    sys.exit(main())
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/fast_carpenter/__main__.py", line 64, in main
    results, _ = backend.execute(sequence, datasets, args)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/fast_carpenter/backends/coffea.py", line 100, in execute
    out = run_uproot_job(coffea_datasets, 'events', fp, executor, executor_args=exe_args)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/coffea/processor/executor.py", line 1068, in run_uproot_job
    executor(chunks, closure, wrapped_out, **exe_args)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/coffea/processor/executor.py", line 567, in futures_executor
    _futures_handler(futures, accumulator, status, unit, desc, add_fn, tailtimeout)
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/site-packages/coffea/processor/executor.py", line 197, in _futures_handler
    add_fn(output, finished.pop().result())
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/home/anaylor/.pyenv/versions/miniconda3-4.3.30/envs/fast_multi_tree/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
TypeError: Iterator operand 1 dtype could not be cast from dtype('uint64') to dtype('int64') according to the rule 'safe'

version: numexpr==2.7.1

@benkrikler
Copy link
Member

Thanks that's helpful. That tells me it's an issue in the expressions module so it's related to the Define stage and actually looks like an internal issue from numexpr, since we dont do anything with the array dtypes explicitly.

Essentially all you're doing in that stage is creating an "alias" so it's surprising to see numexpr doing anything at all, but I can partially reproduce this type casting in numexpr with the following snippet:

In [1]: import numpy as np                                                                                                                                                                                     

In [2]: import numexpr as nepr                                                                                                                                                                                 

In [3]: a = np.arange(10, dtype="uint16")                                                                                                                                                                      

In [4]: a                                                                                                                                                                                                      
Out[4]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint16)

In [5]: nepr.evaluate("a")                                                                                                                                                                                     
Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

I'll look around for any clues online as to why numexpr does this and what we can do to prvent it.

@benkrikler
Copy link
Member

Oh, and I only pasted the example there using uint16 where you can see the casting happening, but not the actual issue. If you swap to unit64 then we see the exact error:

In [6]: a = np.arange(10, dtype="uint64")                                                                                                                                                                      

In [7]: nepr.evaluate("a")                                                                                                                                                                                     
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-a214fcad1da7> in <module>
----> 1 nepr.evaluate("a")

~/.conda/envs/fast-coffea/lib/python3.7/site-packages/numexpr/necompiler.py in evaluate(ex, local_dict, global_dict, out, order, casting, **kwargs)
    832     _numexpr_last = dict(ex=compiled_ex, argnames=names, kwargs=kwargs)
    833     with evaluate_lock:
--> 834         return compiled_ex(*arguments, **kwargs)
    835 
    836 

TypeError: Iterator operand 1 dtype could not be cast from dtype('uint64') to dtype('int64') according to the rule 'safe'

@benkrikler
Copy link
Member

benkrikler commented Aug 19, 2020

Hmm this seems like it could be something fundamental to numexpr: https://numexpr.readthedocs.io/projects/NumExpr3/en/latest/user_guide.html#casting-rules. The casting that's built in seems to be done to simplify the internal operations considerably.

Internally it must be using numpy's astype method which I think throws this exception. You can see it's safe mode described here: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html.

The issue with unit64 --> int64 is that for numbers larger than 2**63 the same combination of bits will become a negative number. ie. you cannot safely guarantee the same interpretation of the actually number. For numbers like uint16 you can make these into signed integers safely provided you also double the number of bits (ie. uint16 --> int32 is safe, so is uint32 --> int64). But no machine these days can represent integers with the 128 bits we'd need for things to work here.

All of which is to say: I'm not yet very sure what we can do for this, provided we rely on numexpr... In this specific case, if we fixed the issue with binnedataframes having troubles with full stops in the branch name we'd be fine, but it means that unit64 variables cannot be used in formulae right now.

I'll keep thinking...

@benkrikler
Copy link
Member

Actually, that was probably overly dramatic: I had a look at the numexpr code where this error comes from and I can see there's an option to control the casting operations directly from the evaluate method:

In [12]: a = np.arange(2**63, 2**63 + 10, dtype="uint64")                                                                                                                                                      

In [13]: a                                                                                                                                                                                                     
Out[13]: 
array([9223372036854775808, 9223372036854775809, 9223372036854775810,
       9223372036854775811, 9223372036854775812, 9223372036854775813,
       9223372036854775814, 9223372036854775815, 9223372036854775816,
       9223372036854775817], dtype=uint64)

In [14]: nepr.evaluate("a", casting="unsafe")                                                                                                                                                                  
Out[14]: 
array([-9223372036854775808, -9223372036854775807, -9223372036854775806,
       -9223372036854775805, -9223372036854775804, -9223372036854775803,
       -9223372036854775802, -9223372036854775801, -9223372036854775800,
       -9223372036854775799], dtype=int64)

In [15]: nepr.evaluate("a")                                                                                                                                                                                    
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-a214fcad1da7> in <module>
----> 1 nepr.evaluate("a")

~/.conda/envs/fast-coffea/lib/python3.7/site-packages/numexpr/necompiler.py in evaluate(ex, local_dict, global_dict, out, order, casting, **kwargs)
    832     _numexpr_last = dict(ex=compiled_ex, argnames=names, kwargs=kwargs)
    833     with evaluate_lock:
--> 834         return compiled_ex(*arguments, **kwargs)
    835 
    836 

TypeError: Iterator operand 1 dtype could not be cast from dtype('uint64') to dtype('int64') according to the rule 'safe'

That code demonstrates why this operation is "unsafe" because what was a postive integer is now negative. I'm not sure what's best to do in this case then. On the one hand we need to be able to support uint64s, but on the other hand, there might be other "unsafe" operations that could cause bugs in an analysis if we just use "unsafe" mode for all operations.

@kreczko
Copy link
Contributor

kreczko commented Dec 8, 2020

This looks like something I need to consider as part of #141 too.
I will add tests for this issue for both numpy and awkward.Array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants