-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Closed
Labels
area: SMPSymmetric multiprocessingSymmetric multiprocessingarea: X86_64x86-64 Architecture (64-bit)x86-64 Architecture (64-bit)bugThe issue is a bug, or the PR is fixing a bugThe issue is a bug, or the PR is fixing a bugpriority: mediumMedium impact/importance bugMedium impact/importance bug
Description
I'm seeing some sporadic crashes on x86_64.
These crashes seem to have the following characteristics:
- Instruction pointer (RIP) is NULL
- It seems to happen when main is creating new child threads to run test cases, but I haven't been able to pinpoint where or get a stack trace
Here's an example, but I have seen this occur in a lot of tests:
*** Booting Zephyr OS build zephyr-v2.1.0-238-g5abb770487f7 ***
Running test suite test_sprintf
===================================================================
starting test - test_sprintf_double
SKIP - test_sprintf_double
===================================================================
starting test - test_sprintf_integer
E: ***** CPU Page Fault (error code 0x0000000000000010)
E: Supervisor thread executed address 0x0000000000000000
E: PML4E: 0x000000000011a827 Writable, User, Execute Enabled
E: PDPTE: 0x0000000000119827 Writable, User, Execute Enabled
E: PDE: 0x0000000000118827 Writable, User, Execute Enabled
E: PTE: Non-present
E: RAX: 0x0000000000000008 RBX: 0x0000000000000000 RCX: 0x00000000000f4240 RDX: 0x0000000000000000
E: RSI: 0x0000000000127000 RDI: 0x0000000000002710 RBP: 0x0000000000000000 RSP: 0x0000000000126fb0
E: R8: 0x000000000011cd0c R9: 0x0000000000000000 R10: 0x0000000000000000 R11: 0x0000000000000000
E: R12: 0x0000000001000000 R13: 0x0000000000000000 R14: 0x0000000000000000 R15: 0x0000000000000000
E: RSP: 0x0000000000126fb0 RFLAGS: 0x0000000000000202 CS: 0x0018 CR3: 0x000000000010a000
E: call trace:
E: RIP: 0x0000000000000000
E: NULL base ptr
E: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 1
E: Current thread: 0x000000000011c8a0 (main)
E: Halting system
Started noticing this after I enabled boot page tables. It's unclear whether my work introduced this, or this was an issue that was already present, although I'm starting to suspect the latter since the code I brought in works great for 32-bit.
Due to sanitycheck automatic retries of failed test cases (see #14173) this has gone undetected in CI.
Metadata
Metadata
Assignees
Labels
area: SMPSymmetric multiprocessingSymmetric multiprocessingarea: X86_64x86-64 Architecture (64-bit)x86-64 Architecture (64-bit)bugThe issue is a bug, or the PR is fixing a bugThe issue is a bug, or the PR is fixing a bugpriority: mediumMedium impact/importance bugMedium impact/importance bug