Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered #239

Open
ilnamkang opened this issue Jun 1, 2024 · 0 comments
Open

Comments

@ilnamkang
Copy link

Computational environment

  • OS: Ubuntu 20.04.1
  • CUDA version if Linux
    Cuda compilation tools, release 12.4, V12.4.131
    Build cuda_12.4.r12.4/compiler.34097967_0

Hi,

My job failed with an error message like "CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered".

Is this due to a shortage of GPU memory?

I ran the job on a server with two Quadro RTX 8000. Because I was allowed to use only one of the two GPUs, I ran the command below before running colabfold_batch.
export CUDA_VISIBLE_DEVICES=0

My main command is below.
nohup colabfold_batch Hexamer.faa Hexamer.ColabFold --num-recycle 3 > nohup.log 2>&1 &

Below is the whole "log.txt" file created within "Hexamer.ColabFold" directory.

2024-06-01 13:39:23,688 Running colabfold 1.5.5 (1648d2335943f9a483b6a803ebaea3e76162c788)
2024-06-01 13:39:23,887 Running on GPU
2024-06-01 13:39:24,307 Found 5 citations for tools or databases
2024-06-01 13:39:24,307 Query 1/1: Hexamer (length 5856)
2024-06-01 13:39:25,934 Setting max_seq=508, max_extra_seq=828
2024-06-01 13:59:21,926 Could not predict Hexamer. Not Enough GPU memory? INTERNAL: Failed to enqueue async memset operation: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2024-06-01 13:59:21,926 Done

Below is the whole "nohup log" file.

nohup: ignoring input

WARNING: You are welcome to use the default MSA server, however keep in mind that it's a
limited shared resource only capable of processing a few thousand MSAs per day. Please
submit jobs only from a single IP address. We reserve the right to limit access to the
server case-by-case when usage exceeds fair use. If you require more MSAs: You can 
precompute all MSAs with `colabfold_search` or host your own API and pass it to `--host-url`

2024-06-01 13:39:23,688 Running colabfold 1.5.5 (1648d2335943f9a483b6a803ebaea3e76162c788)
2024-06-01 13:39:23,887 Running on GPU
2024-06-01 13:39:24,307 Found 5 citations for tools or databases
2024-06-01 13:39:24,307 Query 1/1: Hexamer (length 5856)

  0%|          | 0/150 [elapsed: 00:00 remaining: ?]
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?]
COMPLETE:   0%|          | 0/150 [elapsed: 00:00 remaining: ?]
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:00 remaining: 00:00]
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:00 remaining: 00:00]
E0601 13:59:21.874842   93333 gpu_timer.cc:156] INTERNAL: Could not synchronize CUDA stream: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
E0601 13:59:21.874864   93333 gpu_timer.cc:162] INTERNAL: Error destroying CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
E0601 13:59:21.874866   93333 gpu_timer.cc:168] INTERNAL: Error destroying CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
E0601 13:59:21.895475   93333 se_gpu_pjrt_client.cc:634] Failed to query available memory for GPU 0
E0601 13:59:21.895972   93333 se_gpu_pjrt_client.cc:634] Failed to query available memory for GPU 1
2024-06-01 13:39:25,934 Setting max_seq=508, max_extra_seq=828
2024-06-01 13:59:21,926 Could not predict 2901385346_Hexamer. Not Enough GPU memory? INTERNAL: Failed to enqueue async memset operation: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2024-06-01 13:59:21,926 Done

The input was homohexamer with a total length of 5,856 aa.
A job with homopentamer of the same protein (4,880 aa) was finished successfully.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant