Error: C stack usage is too close to the limit——R语言

跑孟德尔随机化，第一次出现栈溢出是处理UKBB数据，5+G的数据，溢出了可以理解，这次才读入700M的暴露，就出现报错这必须是哪里了问题。脚本之前用来读1+G的暴露都没有压力的。

珏世佳人

5678人浏览 · 2023-06-26 22:51:05

珏世佳人 · 2023-06-26 22:51:05 发布

Error: C stack usage is too close to the limit——R语言

跑孟德尔随机化，第一次出现栈溢出是处理UKBB数据，5+G的数据，溢出了可以理解，这次才读入700M的暴露，就出现报错

Error: C stack usage is too close to the limit

这必须是哪里了问题。脚本之前用来读1+G的暴露都没有压力的。

网上原文教程：

帖子1：C Stack Usage is close to the limit and how to solve it
帖子2：Error: C stack usage is too close to the limit
帖子3：Error: C stack usage is too close to the limit at R startup——2022-3-28

翻译过来，出现这样的问题有以下已知的几种可能，以及其他无限可能

1.代码问题：递归函数太深，这意味着超过算力

~~刚学编程的时候，老师说写个死循环，在临终的时候，问还有什么遗愿：我在服务器上的循环还没有执行完，还不能死。。。~~

例1：将一个巨大的数据表与另一个不存在的数据表合并

dt <- merge(dt, another_dt, by = 'ID', all.x = TRUE)

如果确实需要递归，可以在每个递归调用的函数中用“终止器”。终止器的作用是有条件地停止进一步地递归调用，最好地方法是计算在递归中地深度，一旦达到给定极限（在系统发生错误之前）就终止它。

需要注意一些不太明显的递归情况：

library(debug)
mtrace(lapply)

例2：创建一个超长的字符串。应该使用col参数，而不是sep参数。(来源：Stackoverflow 14719349)

output_table_subset = mutate(big_data_frame,
     combined_table = paste0(first_part, second_part, col = "_"))

如果这么写就不会出现这样地问题

output_table_subset = mutate(big_data_frame,
     combined_table = paste0(first_part, second_part, sep = "_"))

例3：两个函数无限期地相互调用。(来源：Stackoverflow 14719349)
换句话说就是：“你提醒我提醒你”
脚本1里有调用脚本2的命令，一旦执行，就会从脚本1执行脚本2，脚本2里又有执行脚本1的命令。于是又去执行脚本1→执行脚本2→执行脚本1→执行脚本2→执行脚本1。。。

change_to_factor <- function(x){
  x <- change_to_character(x)
  as.factor(x)
} 

change_to_character <- function(x){
  x <- change_to_factor(x)
  as.character(x)
}

change_to_character("1")

换句话说，检查是否有东西在无限期地运行。如果不是这样的话，可能是一个包的错误。例如，已知Google Sheets包在某些包的版本中会抛出这种错误。

例4：反复多次执行一个脚本。这种情况可能会发生在粘贴复制代码，不小心把source(“path/file/balabala.R”)这样的代码粘过来了。解决办法：删掉或注释掉，纯纯的物理阉割。

例5：使用函数 leaflet() 做映射时类别出错，该大佬试图映射一个数据框架，其中的日期列是POSIXlt类的。改回POSIXct解决了这个问题。另一位大佬的经验：如果你使用 plot_ly，请检查你传递的是哪些列。似乎对于 POSIXdt/ct 列，你必须在传递给 plotly 之前使用 as.character() ，否则你会得到这个异常!

例6：Caret包有一个名为createDataPartition的函数，当要分区的数据集超过100万行时，该函数总是出错。仅供您参考。

2.配置问题：

堆栈大小是一个操作系统参数，可按进程调整。据我所知，你不能从R内部调整它，但你可以在启动R之前从shell中用ulimit命令调整它。也就是说进terminal去修改。同样地，修改完毕后需要重启电脑或R。

它的工作原理是这样的：

查看默认的ulimit -s值

$ ulimit -s # print default
8192
$ R --slave -e ‘Cstack_info()[“size”]’
size
8388608

8388608 = 1024 * 8192，单位是字节，而不是千字节

将堆栈限制扩大到16兆字节

$ ulimit -s 16384 # enlarge stack limit to 16 megs
$ R --slave -e ‘Cstack_info()[“size”]’
size
16777216

3.程序问题：

有的是通过安装新版本R来解决问题的，出现这个报错的人比较少。建议谨慎选择。因为一旦重装，有的安装包也得重装，而且有的包特别难装。所以在做这个决定之前，可以先测试一下本地的Rstudio以及终端R能不能正常运行脚本。如果他们可以，证明R是没有问题的，Ubuntu和脚本也没有问题。问题仅仅在于Rstudio server。

例1，大佬有一个定义的函数，是基础函数的升级版：

saveRDS()
但是，不小心，这个定义的函数被称为saveRDS()而不是safe_saveRDS()。
因此，过了这个定义，当代码到了实际使用saveRDS(…)的那一行（调用原始的基础版本，而不是升级后的版本），它就出现了上述错误并崩溃了。
所以，如果你在调用一些保存函数时得到了这个错误，看看你是不是不小心碰到了它。

例2：你的 .Rprofile 文件由于某种原因正在递归地加载自己：

来源于帖子3，很有营养！

不过请注意，这个帖子解决的是desktop版本，以下是使用Mac的大佬讨论的内容。
这种情况长这样：
报错：

Error: C stack usage  7953936 is too close to the limit

按网上教程在终端修改 ulimit 命令，仍出现报错：

Isabels-MacBook-Pro ~ % ulimit -s
8176

Isabels-MacBook-Pro ~ % R --slave -e 'Cstack_info()["size"]'
Error: C stack usage  7954496 is too close to the limit
Execution halted

检查 ulimit

Isabels-MacBook-Pro ~ % ulimit
unlimited

将 ulimit 设置为无限，然后报错：

Isabels-MacBook-Pro ~ % ulimit -s unlimited

Isabels-MacBook-Pro ~ % R --slave -e 'Cstack_info()["size"]'
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Execution halted

似乎，Cstack_info() 卡在了无限递归的那部分。
于是，这位大佬查看了自己的 session info

R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.2.1

以及
contents of .Rprofile

# REMEMBER to restart R after you modify and save this file!

# First, execute the global .Rprofile if it exists. You may configure blogdown
# options there, too, so they apply to any blogdown projects. Feel free to
# ignore this part if it sounds too complicated to you.
if (file.exists("~/.Rprofile")) {
  base::sys.source("~/.Rprofile", envir = environment())
}

# Now set options to customize the behavior of blogdown for this project. Below
# are a few sample options; for more options, see
# https://bookdown.org/yihui/blogdown/global-options.html
options(
  # to automatically serve the site on RStudio startup, set this option to TRUE
  blogdown.serve_site.startup = FALSE,
  # to disable knitting Rmd files on save, set this option to FALSE
  blogdown.knit.on_save = TRUE,
  # build .Rmd to .html (via Pandoc); to build to Markdown, set this option to 'm$
  blogdown.method = 'html'
)

# fix Hugo version
options(blogdown.hugo.version = "0.82.0")

下面是来自/Library/Frameworks/R.framework/Resources/library/base/R/Profile的内容

### This is the system Rprofile file. It is always run on startup.
### Additional commands can be placed in site or user Rprofile files
### (see ?Rprofile).

### Copyright (C) 1995-2020 The R Core Team

### Notice that it is a bad idea to use this file as a template for
### personal startup files, since things will be executed twice and in
### the wrong environment (user profiles are run in .GlobalEnv).

.GlobalEnv <- globalenv()
attach(NULL, name = "Autoloads")
.AutoloadEnv <- as.environment(2)
assign(".Autoloaded", NULL, envir = .AutoloadEnv)
T <- TRUE
F <- FALSE
R.version <- structure(R.Version(), class = "simple.list")
version <- R.version            # for S compatibility

## for backwards compatibility only
R.version.string <- R.version$version.string

## NOTA BENE: options() for non-base package functionality are in places like
##            --------- ../utils/R/zzz.R

options(keep.source = interactive())
options(warn = 0)
# options(repos = c(CRAN="@CRAN@"))
# options(BIOC = "http://www.bioconductor.org")

## setting from an env variable added in 4.0.2
local({to <- as.integer(Sys.getenv("R_DEFAULT_INTERNET_TIMEOUT", 60))
    if (is.na(to) || to <= 0) to <- 60L
    options(timeout = to)
})
options(encoding = "native.enc")
options(show.error.messages = TRUE)
## keep in sync with PrintDefaults() in  ../../main/print.c :
options(show.error.messages = TRUE)
## keep in sync with PrintDefaults() in  ../../main/print.c :
options(scipen = 0)
options(max.print = 99999)# max. #{entries} in internal printMatrix()
options(add.smooth = TRUE)# currently only used in 'plot.lm'

if(isFALSE(as.logical(Sys.getenv("_R_OPTIONS_STRINGS_AS_FACTORS_",
                                 "FALSE")))) {
    options(stringsAsFactors = FALSE)
} else {
    options(stringsAsFactors = TRUE)
}

if(!interactive() && is.null(getOption("showErrorCalls")))
    options(showErrorCalls = TRUE)

local({dp <- Sys.getenv("R_DEFAULT_PACKAGES")
       if(identical(dp, "")) ## it fact methods is done first
           dp <- c("datasets", "utils", "grDevices", "graphics",
                   "stats", "methods")
       else if(identical(dp, "NULL")) dp <- character(0)
       else dp <- strsplit(dp, ",")[[1]]
       dp <- sub("[[:blank:]]*([[:alnum:]]+)", "\\1", dp) # strip whitespace
       options(defaultPackages = dp)
    })

## Expand R_LIBS_* environment variables.
Sys.setenv(R_LIBS_SITE =
           .expand_R_libs_env_var(Sys.getenv("R_LIBS_SITE")))
Sys.setenv(R_LIBS_USER =
           .expand_R_libs_env_var(Sys.getenv("R_LIBS_USER")))

local({
    if(nzchar(tl <- Sys.getenv("R_SESSION_TIME_LIMIT_CPU")))
        setSessionTimeLimit(cpu = tl)
    if(nzchar(tl <- Sys.getenv("R_SESSION_TIME_LIMIT_ELAPSED")))
        setSessionTimeLimit(elapsed = tl)
})

        setSessionTimeLimit(elapsed = tl)
})

.First.sys <- function()
{
    for(pkg in getOption("defaultPackages")) {
        res <- require(pkg, quietly = TRUE, warn.conflicts = FALSE,
                       character.only = TRUE)
        if(!res)
            warning(gettextf('package %s in options("defaultPackages") was not found', sQuote(pkg)$
                    call. = FALSE, domain = NA)
    }
}

## called at C level in the startup process prior to .First.sys
.OptRequireMethods <- function()
{
    pkg <- "methods" # done this way to avoid R CMD check warning
    if(pkg %in% getOption("defaultPackages"))
        if(!require(pkg, quietly = TRUE, warn.conflicts = FALSE,
                    character.only = TRUE))
            warning('package "methods" in options("defaultPackages") was not found',
                    call. = FALSE)
}

if(nzchar(Sys.getenv("R_BATCH"))) {
    .Last.sys <- function()
    {
        cat("> proc.time()\n")
        print(proc.time())
    }
    ## avoid passing on to spawned R processes
    ## A system has been reported without Sys.unsetenv, so try this
    try(Sys.setenv(R_BATCH=""))
}

local({
    if(nzchar(rv <- Sys.getenv("_R_RNG_VERSION_")))

local({
    if(nzchar(rv <- Sys.getenv("_R_RNG_VERSION_")))
        suppressWarnings(RNGversion(rv))
})

.sys.timezone <- NA_character_
.First <- NULL
.Last <- NULL
###-*- R -*- Unix Specific ----

.Library <- file.path(R.home(), "library")
.Library.site <- Sys.getenv("R_LIBS_SITE")
.Library.site <- if(!nzchar(.Library.site)) file.path(R.home(), "site-library") else unlist(strspl$
.Library.site <- .Library.site[file.exists(.Library.site)]

invisible(.libPaths(c(unlist(strsplit(Sys.getenv("R_LIBS"), ":")),
                      unlist(strsplit(Sys.getenv("R_LIBS_USER"), ":")
                      ))))
local({
    popath <- Sys.getenv("R_TRANSLATIONS", "")
    if(!nzchar(popath)) {
        paths <- file.path(.libPaths(), "translations", "DESCRIPTION")
        popath <- dirname(paths[file.exists(paths)][1])
    }
    bindtextdomain("R", popath)
    bindtextdomain("R-base", popath)
    assign(".popath", popath, .BaseNamespaceEnv)
})
local({
## we distinguish between R_PAPERSIZE as set by the user and by configure
papersize <- Sys.getenv("R_PAPERSIZE_USER")
if(!nchar(papersize)) {
    lcpaper <- Sys.getlocale("LC_PAPER") # might be null: OK as nchar is 0
    papersize <- if(nchar(lcpaper))
        if(length(grep("(_US|_CA)", lcpaper))) "letter" else "a4"
    else Sys.getenv("R_PAPERSIZE")
}
options(papersize = papersize,

}
options(papersize = papersize,
        printcmd = Sys.getenv("R_PRINTCMD"),
        dvipscmd = Sys.getenv("DVIPS", "dvips"),
        texi2dvi = Sys.getenv("R_TEXI2DVICMD"),
        browser = Sys.getenv("R_BROWSER"),
        pager = file.path(R.home(), "bin", "pager"),
        pdfviewer = Sys.getenv("R_PDFVIEWER"),
        useFancyQuotes = TRUE)
})

## non standard settings for the R.app GUI of the macOS port
if(.Platform$GUI == "AQUA") {
    ## this is set to let RAqua use both X11 device and X11/TclTk
    if (Sys.getenv("DISPLAY") == "")
        Sys.setenv("DISPLAY" = ":0")

    ## this is to allow gfortran compiler to work
    Sys.setenv("PATH" = paste(Sys.getenv("PATH"),":/usr/local/bin",sep = ""))
}## end "Aqua"

## de-dupe the environment on macOS (bug in Yosemite which affects things like PATH)
if (grepl("^darwin", R.version$os)) local({
    ## we have to de-dupe one at a time and re-check since the bug affects how
    ## environment modifications propagate
    while(length(dupes <- names(Sys.getenv())[table(names(Sys.getenv())) > 1])) {
        env <- dupes[1]
        value <- Sys.getenv(env)
        Sys.unsetenv(env)             ## removes the dupes, good
        .Internal(Sys.setenv(env, value)) ## wrapper requries named vector, a pain, hence internal
    }
})

local({
    tests_startup <- Sys.getenv("R_TESTS")
    if(nzchar(tests_startup)) source(tests_startup)
})

进一步的排查方向：

~/.Rprofile 有一些乱七八糟的代码。可以用R–vanilla启动R
如果R–vanilla可以工作，而且~/.Rprofile 看起来很好，也可以检查 /lib/R/etc/ 是否有符号链接，例如到存储用户设置的 /etc/R/。在重新安装R之前，试着删除或重命名/etc/R/目录，因为这些文件不会被覆盖

上述大佬提供的 ~/.Rprofile 某种原因正在递归地加载自己：

if (file.exists("~/.Rprofile")) {
  base::sys.source("~/.Rprofile", envir = environment())
}

从上述讨论来看，这些行在 ~/.Rprofile 里面（~ 扩展到用户的主目录）。

删除这些行（或注释它们），它们不属于这里。事实上，这个文件看起来是一个针对项目的.Rprofile配置的模板。它在项目目录中是有意义的，但不是作为整个配置文件的用户.Rprofile。

这些文件的逻辑如下：

如果在当前目录下有一个.Rprofile文件，R会尝试加载它。
否则，如果环境变量R_PROFILE_USER被设置为一个文件的路径，R会尝试加载这个文件。
否则，如果文件~/.Rprofile存在，R会尝试加载该文件。

现在，这意味着如果存在一个项目专用的（=在当前工作目录下）.Rprofile，那么~/.Rprofile 不会被自动加载。这很不幸，因此许多项目在其项目专用的 .Rprofile 文件中添加了与上述类似的行，以使用户范围的 ~/.Rprofile 也被加载。然而，上述实现忽略了 R_PROFILE_USER 环境变量。

因此，一个更好的实现方式是如下：

rprofile = Sys.getenv('R_PROFILE_USER', '~/.Rprofile')
if (file.exists(rprofile)) {
  base::sys.source(rprofile, envir = environment())
}
rm(rprofile)