Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when calling read_tsv() on an HPC cluster #533

Open
smped opened this issue Apr 3, 2024 · 0 comments
Open

Segfault when calling read_tsv() on an HPC cluster #533

smped opened this issue Apr 3, 2024 · 0 comments

Comments

@smped
Copy link

smped commented Apr 3, 2024

Hi,

I'm having an issue with read_tsv() which appears to be the segfault mentioned here: #510

I'm calling the function inside a conda environment on an HPC. Running it interactively on the file in the conda environment on the head node works fine, but when running as a job within the cluster I get a segfault every time, which is all way above my skill level.

The error I see in my log files is:

*** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: vroom_(file, delim = delim %||% col_types$delim, col_names = col_names,     col_types = col_types, id = id, skip = skip, col_select = col_select,     name_repair = .name_repair, na = na, quote = quote, trim_ws = trim_ws,     escape_double = escape_double, escape_backslash = escape_backslash,     comment = comment, skip_empty_rows = skip_empty_rows, locale = locale,     guess_max = guess_max, n_max = n_max, altrep = vroom_altrep(altrep),     num_threads = num_threads, progress = progress)
 2: vroom::vroom(file, delim = "\t", col_names = col_names, col_types = col_types,     col_select = {        {            col_select        }    }, id = id, .name_repair = name_repair, skip = skip, n_max = n_max,     na = na, quote = quote, comment = comment, skip_empty_rows = skip_empty_rows,     trim_ws = trim_ws, escape_double = TRUE, escape_backslash = FALSE,     locale = locale, guess_max = guess_max, show_col_types = show_col_types,     progress = progress, altrep = lazy, num_threads = num_threads)
 3: fn(x)
 4: FUN(X[[i]], ...)
 5: lapply(rna_files, function(x) {    ln <- readLines(x, 1)    fn <- paste0("read_", ifelse(grepl("\\t", ln), "tsv", "csv"))    fn <- match.fun(fn)    df <- fn(x)    gn_col <- intersect(c("gene_id", "Geneid"), names(df))[[1]]    fc_col <- intersect(c("logFC", "logfc"), names(df))[[1]]    fdr_col <- intersect(c("fdr", "FDR", "adjP", "adj_p"), names(df))[[1]]    dplyr::select(df, gene_id = !!sym(gn_col), logFC = !!sym(fc_col),         FDR = !!sym(fdr_col))})
 6: lapply(rna_files, function(x) {    ln <- readLines(x, 1)    fn <- paste0("read_", ifelse(grepl("\\t", ln), "tsv", "csv"))    fn <- match.fun(fn)    df <- fn(x)    gn_col <- intersect(c("gene_id", "Geneid"), names(df))[[1]]    fc_col <- intersect(c("logFC", "logfc"), names(df))[[1]]    fdr_col <- intersect(c("fdr", "FDR", "adjP", "adj_p"), names(df))[[1]]    dplyr::select(df, gene_id = !!sym(gn_col), logFC = !!sym(fc_col),         FDR = !!sym(fdr_col))})

Is that vroom release mentioned in the above issue able to be released soon? I notice it's still at v1.6.5.***.

Relevant package versions & the HPC OS below, however this is from the head node. When I look at other files where I've printed a sessionInfo() when running on the cluster, I don't seem to get the Running under: Red Hat Enterprise Linux 8.4 (Ootpa) and Matrix products: default BLAS/LAPACK: /hpcfs/users/******/envs/f4994948c5b33369acc304940a5fa825_/lib/libopenblasp-r0.3.26.so; LAPACK version 3.12.0 lines. I'm not sure if that's helpful information or not though.

sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.4 (Ootpa)

Matrix products: default
BLAS/LAPACK: /hpcfs/users/******/envs/f4994948c5b33369acc304940a5fa825_/lib/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

time zone: Australia/Adelaide
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] vroom_1.6.5 readr_2.1.5

loaded via a namespace (and not attached):
 [1] utf8_1.2.4       R6_2.5.1         tidyselect_1.2.0 bit_4.0.5       
 [5] tzdb_0.4.0       magrittr_2.0.3   glue_1.7.0       tibble_3.2.1    
 [9] pkgconfig_2.0.3  bit64_4.0.5      lifecycle_1.0.4  cli_3.6.2       
[13] fansi_1.0.6      vctrs_0.6.5      compiler_4.3.3   hms_1.1.3       
[17] pillar_1.9.0     crayon_1.5.2     rlang_1.1.3  
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant