-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Closed
Description
I have recently begun experiencing a strange issue where a long-running process with many workers will segfault after about an hour with the following output:
signal (11): Segmentation fault
in expression starting at /srv/git/rys_nucleosomes/nested_sampling/dif_pos_learner.jl:82
sig_match_fast at /buildworker/worker/package_linux64/build/src/gf.c:2250 [inlined]
jl_lookup_generic_ at /buildworker/worker/package_linux64/build/src/gf.c:2332 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2394
serialize_any at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Serialization/src/Serialization.jl:648
serialize at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Serialization/src/Serialization.jl:627 [inlined]
serialize at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Serialization/src/Serialization.jl:272
serialize at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Serialization/src/Serialization.jl:2000
unknown function (ip: 0x7f303d394df5)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
serialize_msg at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/messages.jl:90
unknown function (ip: 0x7f303d392e05)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1690 [inlined]
do_apply at /buildworker/worker/package_linux64/build/src/builtins.c:655
jl_f__apply_latest at /buildworker/worker/package_linux64/build/src/builtins.c:705
#invokelatest#1 at ./essentials.jl:710 [inlined]
invokelatest at ./essentials.jl:709 [inlined]
send_msg_ at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/messages.jl:185
send_msg_now at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/messages.jl:130 [inlined]
send_msg_now at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/messages.jl:125
deliver_result at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:111
unknown function (ip: 0x7f303d394ab8)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:302 [inlined]
#105 at ./task.jl:356
unknown function (ip: 0x7f303d390b6c)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1690 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:707
unknown function (ip: (nil))
Allocations: 859407961 (Pool: 859209782; Big: 198179); GC: 217
fish: “julia” terminated by signal SIGSEGV (Address boundary error)
I really have no clue where to start producing a MWE because of the "unknown function" stuff. Does anyone have any hint as to what might be happening here? I am confused by "serialize" in this report, as none of the code the remote workers are executing makes calls to serialize(). Could this be arising as a result of remote workers calling remotecall_fetch(deserialize,...)? That's the only thing in my code the remote workers are executing that has anything to do with Serialization, so maybe it's something lower level than that.
julia> versioninfo()
Julia Version 1.5.1
Commit 697e782ab8 (2020-08-25 20:08 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 2
Metadata
Metadata
Assignees
Labels
No labels