Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::runtime_error -- what(): qds_device::wait() unexpected command state #1751

Open
hecmay opened this issue Sep 5, 2024 · 5 comments
Open

Comments

@hecmay
Copy link

hecmay commented Sep 5, 2024

I was running cascade matrix multiple example from programming examples: https://github.com/Xilinx/mlir-aie/tree/main/programming_examples/basic/matrix_multiplication/cascade

I noticed that the code only works when M or m is greater than 64. When I set them to smaller values, say M = 32, n = 32, the host code will throw the following error:

Running Kernel (iteration 0).
terminate called after throwing an instance of 'std::runtime_error'
  what():  qds_device::wait() unexpected command state
Aborted (core dumped)
make: *** [/home/ubuntu/mlir-aie-test/programming_examples/basic/matrix_multiplication/cascade/../makefile-common:114: run] Error 134

No error from XCLBIN compilation process, and it seems to be something wrong with runtime? Any idea how this can be fixed?

@hecmay
Copy link
Author

hecmay commented Sep 5, 2024

Running into exactly the same error in matrix vector sample code if the n_cores is > 1: https://github.com/Xilinx/mlir-aie/blob/main/programming_examples/basic/matrix_multiplication/matrix_vector/aie2.py#L21

@hecmay
Copy link
Author

hecmay commented Sep 6, 2024

I am not sure if this error is caused by my environment setup. I am using Phoenix Point Mini PC: Minisforum UM790 Pro : AMD Ryzen™ 9 7940HS. I followed every single step in this README: https://github.com/Xilinx/mlir-aie/blob/main/docs/buildHostLin.md

Linux kernel: 6.10
Vitis 2023.2
AMDXDNA: 2.18.0_20240825, 537a509a3ab1b698c9c9f6ebcd88035b2fe8359b

Can anyone reproduce the issue? Any help would be highly appreciated. Thanks! @stephenneuendorffer @fifield @hunhoffe @Yu-Zhewen @makslevental

@PisonJay
Copy link

Me too. Getting same error with Ryzen AI 9 365.

@PisonJay
Copy link

PisonJay commented Sep 16, 2024

I guess that current infrastructure only supports XDNA1 (AIE2) architecture. Strix Point, XDNA2 (with code name AIEP) is not supported yet. So does Peano, listing XNDA2 as "coming soon". The only usable runtime is ONNX runtime from Ryzen AI SDK, only available on Windows at current time.

@hecmay
Copy link
Author

hecmay commented Sep 16, 2024

I guess that current infrastructure only supports XDNA1 (AIE2) architecture. Strix Point, XDNA2 (with code name AIEP) is not supported yet. So does Peano, listing XNDA2 as "coming soon". The only usable runtime is ONNX runtime from Ryzen AI SDK, only available on Windows at current time.

Not so sure if that's the cause. On my side, the program is still runnable in some cases if the M/N/K values make the runtime happy.

And I do not think ONNX runtime is used in these examples. It should be some weird problems from Xilinx Runtime: https://github.com/Xilinx/mlir-aie/blob/main/programming_examples/basic/matrix_multiplication/test.cpp#L23-L25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants