-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Hi,
I encountered a crash today, after some investigation, I think I have found the reason.
I ran java-profiler using the command bellow. Run with Datadog Java agent can trigger this crash too (not tested).
/usr/lib/jvm/java-8-openjdk-amd64/bin/java -agentpath:/root/java-profiler/ddprof-lib/build/lib/main/release/linux/x64/libjavaProfiler.so=start,cpu=10ms,file=/tmp/ap.jfr -cp java Demo (Any Java code can reproduce)
openjdk version "1.8.0_442"
OpenJDK Runtime Environment (build 1.8.0_442-8u442-b06~us1-0ubuntu1~24.04-b06)
OpenJDK 64-Bit Server VM (build 25.442-b06, mixed mode)
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007ffff7c4527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007ffff7c288ff in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007ffff6df6f0b in os::abort(bool) [clone .cold] () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#6 0x00007ffff7757a6d in VMError::report_and_die() () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#7 0x00007ffff759d0fd in JVM_handle_linux_signal () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#8 0x00007ffff759024c in signalHandler(int, siginfo_t*, void*) () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#9 <signal handler called>
#10 0x0000000000000000 in ?? ()
#11 0x00007ffff6b64e67 in J9Ext::GetOSThreadID (thread=0x7ffff027b930) at /root/java-profiler/ddprof-lib/src/main/cpp/j9Ext.h:97
#12 VMThread::nativeThreadId (jni=jni@entry=0x7ffff026e260, thread=thread@entry=0x7ffff027b930) at /root/java-profiler/ddprof-lib/src/main/cpp/vmStructs.cpp:713
#13 0x00007ffff6b3a811 in Profiler::updateThreadName (this=this@entry=0x7ffff0005180, jvmti=jvmti@entry=0x7ffff0019930, jni=jni@entry=0x7ffff026e260, thread=thread@entry=0x7ffff027b930, self=self@entry=true) at /root/java-profiler/ddprof-lib/src/main/cpp/profiler.cpp:935
#14 0x00007ffff6b3a92e in Profiler::onThreadStart (this=0x7ffff0005180, jvmti=0x7ffff0019930, jni=0x7ffff026e260, thread=0x7ffff027b930) at /root/java-profiler/ddprof-lib/src/main/cpp/profiler.cpp:111
#15 0x00007ffff73e522e in JvmtiExport::post_thread_start(JavaThread*) () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#16 0x00007ffff732c0d8 in JNI_CreateJavaVM () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#17 0x00007ffff7f8b45a in JavaMain () from /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/../lib/amd64/jli/libjli.so
#18 0x00007ffff7f8f961 in call_continuation () from /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/../lib/amd64/jli/libjli.so
#19 0x00007ffff7c9caa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#20 0x00007ffff7d29c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
According to the stack trace, there must be something wrong with the VMStructs parsing, J9Ext should not be called on OpenJDK 8.
java-profiler/ddprof-lib/src/main/cpp/vmStructs.cpp
Lines 708 to 714 in cc98924
| int VMThread::nativeThreadId(JNIEnv *jni, jthread thread) { | |
| if (_has_native_thread_id) { | |
| VMThread *vm_thread = fromJavaThread(jni, thread); | |
| return vm_thread != NULL ? vm_thread->osThreadId() : -1; | |
| } | |
| return J9Ext::GetOSThreadID(thread); | |
| } |
Debugging with debug build of java-profiler, I found gHotSpotVMStructs can not be found, then VMStructs::initOffsets returned in line vmStructs.cpp:164.
java-profiler/ddprof-lib/src/main/cpp/vmStructs.cpp
Lines 155 to 165 in cc98924
| void VMStructs::initOffsets() { | |
| uintptr_t entry = readSymbol("gHotSpotVMStructs"); | |
| uintptr_t stride = readSymbol("gHotSpotVMStructEntryArrayStride"); | |
| uintptr_t type_offset = readSymbol("gHotSpotVMStructEntryTypeNameOffset"); | |
| uintptr_t field_offset = readSymbol("gHotSpotVMStructEntryFieldNameOffset"); | |
| uintptr_t offset_offset = readSymbol("gHotSpotVMStructEntryOffsetOffset"); | |
| uintptr_t address_offset = readSymbol("gHotSpotVMStructEntryAddressOffset"); | |
| if (entry == 0 || stride == 0) { | |
| return; | |
| } |
After further debugging, I found some symbols are skipped in line symbols_linux.cpp:357.
| if (_length == 0 || (sym->st_name < _length && sym->st_value < _length)) { |
java-profiler/ddprof-lib/src/main/cpp/symbols_linux.cpp
Lines 350 to 367 in cc98924
| void ElfParser::loadSymbolTable(const char *symbols, size_t total_size, | |
| size_t ent_size, const char *strings) { | |
| for (const char *symbols_end = symbols + total_size; symbols < symbols_end; | |
| symbols += ent_size) { | |
| ElfSymbol *sym = (ElfSymbol *)symbols; | |
| if (sym->st_name != 0 && sym->st_value != 0) { | |
| // sanity check the offsets not to exceed the file size | |
| if (_length == 0 || (sym->st_name < _length && sym->st_value < _length)) { | |
| // Skip special AArch64 mapping symbols: $x and $d | |
| if (sym->st_size != 0 || sym->st_info != 0 || | |
| strings[sym->st_name] != '$') { | |
| _cc->add(_base + sym->st_value, (int)sym->st_size, | |
| strings + sym->st_name); | |
| } | |
| } | |
| } | |
| } | |
| } |
In my case, the symbols are all stripped from libjvm.so, and stored in a separate file, which can be installed via apt-get install openjdk-8-dbg.
but line symbols_linux.cpp:357 compares the virtual address offset (i.e. sym->st_value, 0xdd82b8 = 14516920) with the debug file size (i.e. 2675232), and obviously, 14516920 is greater than 2675232, then symbol gHotSpotVMStructs is skipped.
I think the virtual address offset (sym->st_value,) should not be compared with debug file size (_length).
file /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=734da9c23b83138419928d48e1cc65c9b75facc3, stripped
ls -rtl /usr/lib/debug/.build-id/73/4da9c23b83138419928d48e1cc65c9b75facc3.debug
-rw-r--r-- 1 root root 2675232 Jan 26 15:38 /usr/lib/debug/.build-id/73/4da9c23b83138419928d48e1cc65c9b75facc3.debug
readelf -s -W /usr/lib/debug/.build-id/73/4da9c23b83138419928d48e1cc65c9b75facc3.debug|grep gHotSpotVMStructs
41073: 0000000000dd82b8 8 OBJECT GLOBAL DEFAULT 25 gHotSpotVMStructs
(gdb) p _length
$1 = 2675232
(gdb) p sym->st_name
$3 = 1685507
(gdb) p sym->st_value
$4 = 14516920
(gdb) p/x sym->st_value
$5 = 0xdd82b8
#0 ElfParser::loadSymbolTable (this=0x7ffff7bfd550, symbols=0x7ffff4af0d20 "\003\270\031", total_size=986328, ent_size=24, strings=0x7ffff4af0f60 "") at /root/java-profiler/ddprof-lib/src/main/cpp/symbols_linux.cpp:366
#1 0x00007ffff7a9be3d in ElfParser::loadSymbols (this=0x7ffff7bfd550, use_debug=false) at /root/java-profiler/ddprof-lib/src/main/cpp/symbols_linux.cpp:258
#2 0x00007ffff7a9b4f8 in ElfParser::parseFile (cc=0x7ffff00d7250, base=0x7ffff6c00000 "\177ELF\002\001\001", file_name=0x7ffff7bfd5f0 "/usr/lib/debug/.build-id/73/4da9c23b83138419928d48e1cc65c9b75facc3.debug", use_debug=false)
at /root/java-profiler/ddprof-lib/src/main/cpp/symbols_linux.cpp:85
#3 0x00007ffff7a9c125 in ElfParser::loadSymbolsUsingBuildId (this=0x7ffff7bfe6b0) at /root/java-profiler/ddprof-lib/src/main/cpp/symbols_linux.cpp:304
#4 0x00007ffff7a9be65 in ElfParser::loadSymbols (this=0x7ffff7bfe6b0, use_debug=true) at /root/java-profiler/ddprof-lib/src/main/cpp/symbols_linux.cpp:263
#5 0x00007ffff7a9b4f8 in ElfParser::parseFile (cc=0x7ffff00d7250, base=0x7ffff6c00000 "\177ELF\002\001\001", file_name=0x7ffff0028349 "/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so", use_debug=true)
at /root/java-profiler/ddprof-lib/src/main/cpp/symbols_linux.cpp:85
#6 0x00007ffff7a9cee4 in parseLibrariesCallback (info=0x7ffff7bfe830, size=64, data=0x7ffff7af9e20 <Libraries::instance()::instance>) at /root/java-profiler/ddprof-lib/src/main/cpp/symbols_linux.cpp:508
#7 0x00007ffff7d84002 in __GI___dl_iterate_phdr (callback=0x7ffff7a9cb6b <parseLibrariesCallback(dl_phdr_info*, size_t, void*)>, data=0x7ffff7af9e20 <Libraries::instance()::instance>) at ./elf/dl-iteratephdr.c:74
#8 0x00007ffff7a9d172 in Symbols::parseLibraries (array=0x7ffff7af9e20 <Libraries::instance()::instance>, kernel_symbols=false) at /root/java-profiler/ddprof-lib/src/main/cpp/symbols_linux.cpp:550
#9 0x00007ffff7abdaa5 in Libraries::updateSymbols (this=0x7ffff7af9e20 <Libraries::instance()::instance>, kernel_symbols=false) at /root/java-profiler/ddprof-lib/src/main/cpp/libraries.cpp:36
#10 0x00007ffff7a990cc in VM::initShared (vm=0x7ffff79d16c0 <main_vm>) at /root/java-profiler/ddprof-lib/src/main/cpp/vmEntry.cpp:205
#11 0x00007ffff7a99593 in VM::initProfilerBridge (vm=0x7ffff79d16c0 <main_vm>, attach=false) at /root/java-profiler/ddprof-lib/src/main/cpp/vmEntry.cpp:302
#12 0x00007ffff7a9a282 in Agent_OnLoad (vm=0x7ffff79d16c0 <main_vm>, options=0x7ffff0003f40 "start,cpu=10ms,file=/tmp/ap.jfr", reserved=0x0) at /root/java-profiler/ddprof-lib/src/main/cpp/vmEntry.cpp:551
#13 0x00007ffff76f6674 in Threads::create_vm_init_agents() () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#14 0x00007ffff76f92e2 in Threads::create_vm(JavaVMInitArgs*, bool*) () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#15 0x00007ffff732c010 in JNI_CreateJavaVM () from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#16 0x00007ffff7f8b45a in JavaMain () from /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/../lib/amd64/jli/libjli.so
#17 0x00007ffff7f8f961 in call_continuation () from /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/../lib/amd64/jli/libjli.so
#18 0x00007ffff7c9caa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#19 0x00007ffff7d29c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
The test does not crash on Java 11, Java 17 and Java 21, but I think It's just a coincidence. The virtual address offset just happens to be smaller than the debug file size on JDK 11, JDK 17 and JDK 21.
I don't know why this check is added. If there is no real example, the simple fix is to remove this check, and I can submit a PR.
Thanks.