Skip to content

Conversation

MaskRay
Copy link
Owner

@MaskRay MaskRay commented Jul 31, 2025

Without linker relaxation enabled for a particular relocatable file or
section (e.g., using .option norelax), the assembler will not generate
R_RISCV_ALIGN relocations for alignment directives. This becomes
problematic in a two-stage linking process:

```
ld -r a.o b.o -o ab.o
// b.o is norelax. Its alignment information is lost in ab.o.
ld ab.o -o ab
```

When ab.o is linked into an executable, the preceding relaxed section
(a.o's content) might shrink. Since there's no R_RISCV_ALIGN relocation
in b.o for the linker to act upon, the `.word 0x3a393837` data in b.o
may end up unaligned in the final executable.

To address the issue, this patch inserts NOP bytes and synthesizes an
R_RISCV_ALIGN relocation at the beginning of a text section when the
alignment >= 4.

For simplicity, when RVC is disabled, we synthesize an ALIGN relocation
(addend: 2) for a 4-byte aligned section, allowing the linker to trim
the excess 2 bytes.

See also https://sourceware.org/bugzilla/show_bug.cgi?id=33236
@MaskRay
Copy link
Owner Author

MaskRay commented Aug 9, 2025

This is a draft o llvm#151639 , which has been merged

@MaskRay MaskRay closed this Aug 9, 2025
MaskRay pushed a commit that referenced this pull request Aug 15, 2025
## Problem

When the new setting

```
set target.parallel-module-load true
```
was added, lldb began fetching modules from the devices from multiple
threads simultaneously. This caused crashes of lldb when debugging on
android devices.

The top of the stack in the crash look something like this:
```
#0 0x0000555aaf2b27fe llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/llvm/bin/lldb-dap+0xb87fe)
 #1 0x0000555aaf2b0a99 llvm::sys::RunSignalHandlers() (/opt/llvm/bin/lldb-dap+0xb6a99)
 #2 0x0000555aaf2b2fda SignalHandler(int, siginfo_t*, void*) (/opt/llvm/bin/lldb-dap+0xb8fda)
 #3 0x00007f9c02444560 __restore_rt /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:13:0
 #4 0x00007f9c04ea7707 lldb_private::ConnectionFileDescriptor::Disconnect(lldb_private::Status*) (usr/bin/../lib/liblldb.so.15+0x22a7707)
 #5 0x00007f9c04ea5b41 lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5b41)
 llvm#6 0x00007f9c04ea5c1e lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5c1e)
 llvm#7 0x00007f9c052916ff lldb_private::platform_android::AdbClient::SyncService::Stat(lldb_private::FileSpec const&, unsigned int&, unsigned int&, unsigned int&) (usr/bin/../lib/liblldb.so.15+0x26916ff)
 llvm#8 0x00007f9c0528b9dc lldb_private::platform_android::PlatformAndroid::GetFile(lldb_private::FileSpec const&, lldb_private::FileSpec const&) (usr/bin/../lib/liblldb.so.15+0x268b9dc)
```
Our workaround was to set `set target.parallel-module-load ` to `false`
to avoid the crash.

## Background

PlatformAndroid creates two different classes with one stateful adb
connection shared between the two -- one through AdbClient and another
through AdbClient::SyncService. The connection management and state is
complex, and seems to be responsible for the segfault we are seeing. The
AdbClient code resets these connections at times, and re-establishes
connections if they are not active. Similarly, PlatformAndroid caches
its SyncService, which uses an AdbClient class, but the SyncService puts
its connection into a different 'sync' state that is incompatible with a
standard connection.

## Changes in this diff

* This diff refactors the code to (hopefully) have clearer ownership of
the connection, clearer separation of AdbClient and SyncService by
making a new class for clearer separations of concerns, called
AdbSyncService.
* New unit tests are added
* Additional logs were added (see
llvm#145382 (comment)
for details)
MaskRay pushed a commit that referenced this pull request Aug 15, 2025
…namic (llvm#153420)

Canonicalizing the following IR:

```
func.func @mul_zero_dynamic_nofold(%arg0: tensor<?x17xf32>) -> tensor<?x17xf32> {
  %0 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1x1xf32>}> : () -> tensor<1x1xf32>
  %1 = "tosa.const"() <{values = dense<0> : tensor<1xi8>}> : () -> tensor<1xi8>
  %2 = tosa.mul %arg0, %0, %1 : (tensor<?x17xf32>, tensor<1x1xf32>, tensor<1xi8>) -> tensor<?x17xf32>
  return %2 : tensor<?x17xf32>
}
```

resulted in a crash

```
#0 0x000056513187e8db backtrace (./build-release/bin/mlir-opt+0x9d698db)                                                                                                                                                                                                                                                                                                                   
 #1 0x0000565131b17737 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:838:8                                                                                                                                                                                                                
 #2 0x0000565131b187f3 PrintStackTraceSignalHandler(void*) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:918:1                                                                                                                                                                                                                                
 #3 0x0000565131b18c30 llvm::sys::RunSignalHandlers() /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Signals.cpp:105:18                                                                                                                                                                                                                                         
 #4 0x0000565131b18c30 SignalHandler(int, siginfo_t*, void*) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:409:3                                                                                                                                                                                                                              
 #5 0x00007f2e4165b050 (/lib/x86_64-linux-gnu/libc.so.6+0x3c050)                                                                                                                                                                                                                                                                                                                            
 llvm#6 0x00007f2e416a9eec __pthread_kill_implementation ./nptl/pthread_kill.c:44:76                                                                                                                                                                                                                                                                                                            
 llvm#7 0x00007f2e4165afb2 raise ./signal/../sysdeps/posix/raise.c:27:6                                                                                                                                                                                                                                                                                                                         
 llvm#8 0x00007f2e41645472 abort ./stdlib/abort.c:81:7                                                                                                                                                                                                                                                                                                                                          
 llvm#9 0x00007f2e41645395 _nl_load_domain ./intl/loadmsgcat.c:1177:9                                                                                                                                                                                                                                                                                                                           
llvm#10 0x00007f2e41653ec2 (/lib/x86_64-linux-gnu/libc.so.6+0x34ec2)                                                                                                                                                                                                                                                                                                                            
llvm#11 0x00005651443ec4ba mlir::DenseIntOrFPElementsAttr::getRaw(mlir::ShapedType, llvm::ArrayRef<char>) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/IR/BuiltinAttributes.cpp:1361:3                                                                                                                                                                                    
llvm#12 0x00005651443f1209 mlir::DenseElementsAttr::resizeSplat(mlir::ShapedType) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/IR/BuiltinAttributes.cpp:0:10                                                                                                                                                                                                              
llvm#13 0x000056513f76f2b6 mlir::tosa::MulOp::fold(mlir::tosa::MulOpGenericAdaptor<llvm::ArrayRef<mlir::Attribute>>) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/Dialect/Tosa/IR/TosaCanonicalizations.cpp:0:0
```

from the folder for `tosa::mul` since the zero value was being reshaped
to `?x17` size which isn't supported. AFAIK, `tosa.const` requires all
dimensions to be static. So in this case, the fix is to not to fold the
op.
MaskRay pushed a commit that referenced this pull request Aug 24, 2025
This can happen when JIT code is run, and we can't symbolize those
frames, but they should remain numbered in the stack. An example
spidermonkey trace:

```
    #0 0x564ac90fb80f  (/builds/worker/dist/bin/js+0x240e80f) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58)
    #1 0x564ac9223a64  (/builds/worker/dist/bin/js+0x2536a64) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58)
    #2 0x564ac922316f  (/builds/worker/dist/bin/js+0x253616f) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58)
    #3 0x564ac9eac032  (/builds/worker/dist/bin/js+0x31bf032) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58)
    #4 0x0dec477ca22e  (<unknown module>)
```

Without this change, the following symbolization is output:

```
    #0 0x55a6d72f980f in MOZ_CrashSequence /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:248:3
    #1 0x55a6d72f980f in Crash(JSContext*, unsigned int, JS::Value*) /builds/worker/checkouts/gecko/js/src/shell/js.cpp:4223:5
    #2 0x55a6d7421a64 in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:501:13
    #3 0x55a6d742116f in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:597:12
    #4 0x55a6d80aa032 in js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICFallbackStub*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) /builds/worker/checkouts/gecko/js/src/jit/BaselineIC.cpp:1705:10
    #4 0x2c803bd8f22e  (<unknown module>)
```

The last frame has a duplicate number. With this change the numbering is
correct:

```
    #0 0x5620c58ec80f in MOZ_CrashSequence /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:248:3
    #1 0x5620c58ec80f in Crash(JSContext*, unsigned int, JS::Value*) /builds/worker/checkouts/gecko/js/src/shell/js.cpp:4223:5
    #2 0x5620c5a14a64 in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:501:13
    #3 0x5620c5a1416f in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:597:12
    #4 0x5620c669d032 in js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICFallbackStub*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) /builds/worker/checkouts/gecko/js/src/jit/BaselineIC.cpp:1705:10
    #5 0x349f24c7022e  (<unknown module>)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant