Skip to content

RP2040 (Raspberry Pi Pico) fails to start up - code clobbered or XIP-flash inaccessible? #82632

@DavidCPlatt

Description

@DavidCPlatt

Describe the bug

My application is running on a Raspberry Pi Pico. It's been operating successfully for about a year now.

Some of the time-critical portions of the code (signal demodulation) are flagged in the CMakeLists
file as needing relocation to RAM, in order to avoid XIP overhead and performance loss:

  zephyr_code_relocate(FILES src/demod.c LOCATION RAM NOKEEP)
  zephyr_code_relocate(FILES src/biquad.c LOCATION RAM NOKEEP)
  zephyr_code_relocate(FILES src/hdlc.c LOCATION RAM NOKEEP)
  zephyr_code_relocate(FILES ../../drivers/dma/dma_rpi_pico.c LOCATION RAM NOKEEP)

I've added some new code to these routines recently. Suddenly, after adding some quite
vanilla code, the app wouldn't start up at all... even the main() routine wasn't being
called. I played around with alternatives, and found that the issue didn't seem to be
the content of the code, only the amount of it.

When I attached a debugger to the Pi (a BlackMagic Debug probe) and fired up GDB, I
found that the application was stuck in a memmove() in the early stages of kernel startup.
Upon examination, GDB showed me what appears to be a memory clobber - a bunch of
instructions in the memmove() code and subsequent routines are showing up as zeros
("movs r0,r0"). With the code having been "gutted" it hangs.

A GDB memory dump seems to show that almost everything from 0x10000000 onwards is
reading as zero... as if the flash has stopped responding.

If I reset the board into the bootloader, and use "picotool" to read back the entire contents
of the flash, it's a byte-for-byte exact match with the "zephyr.bin" created by the build...
so, as far as I can tell, memmove() and the other code is OK in flash. It's just not being
read properly during the early memmove() for some reason... apparently both the XIP
of the instructions, and a GDB memory-dump access are getting zeros for all of these
locations.

I've seen this behavior on two different boards (both with vanilla Raspberry Pi Pico modules in
good working condition). I've seen it after two different "download to flash" techniques
(copying the .UF2 file to the "flash disk", or using picotool to talk directly to the bootloader).

Oddly, if I use GDB to run the image (downloading it directly) it reliably starts up OK, and
the code is readable out of flash without error.

Expected behavior

Code is successfully relocated from flash to RAM, and executes properly.

Impact

Showstopper for continuing to add code which must run from RAM for performance reasons.
Possible workaround is to restructure my code to minimize the number of routines
marked as "must be relocated to RAM".

Logs and console output

$ arm-none-eabi-gdb build/zephyr/zephyr.elf
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from build/zephyr/zephyr.elf...
Available Targets:
No. Att Driver
 1      Raspberry RP2040 M0+
 2      Raspberry RP2040 M0+
 3      Raspberry RP2040 Rescue (Attach to reset!) 
0x10016216 in memmove ()
(gdb) bt
#0  0x10016216 in memmove ()
#1  0x10015a0c in z_early_memcpy (dst=<optimized out>, src=<optimized out>, 
    n=<optimized out>) at /home/dplatt/zephyrproject/zephyr/kernel/init.c:210
#2  0x10012124 in data_copy_xip_relocation ()
    at /home/dplatt/zephyrproject/zephyr/apps/tnc/build/zephyr/code_relocation.c:23
#3  0x10011c02 in z_data_copy ()
    at /home/dplatt/zephyrproject/zephyr/kernel/xip.c:49
#4  0x1000e8fc in z_prep_c ()
    at /home/dplatt/zephyrproject/zephyr/arch/arm/core/cortex_m/prep_c.c:195
#5  0x1000e750 in z_arm_reset ()
    at /home/dplatt/zephyrproject/zephyr/arch/arm/core/cortex_m/reset.S:169
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) disassemble
Dump of assembler code for function memmove:
   0x1001620e <+0>:	push	{r4, lr}
   0x10016210 <+2>:	cmp	r0, r1
   0x10016212 <+4>:	bhi.n	0x10016222 <memmove+20>
   0x10016214 <+6>:	movs	r3, #0
=> 0x10016216 <+8>:	cmp	r2, r3
   0x10016218 <+10>:	beq.n	0x1001622c <memmove+30>
   0x1001621a <+12>:	ldrb	r4, [r1, r3]
   0x1001621c <+14>:	strb	r4, [r0, r3]
   0x1001621e <+16>:	adds	r3, #1
   0x10016220 <+18>:	b.n	0x10016216 <memmove+8>
   0x10016222 <+20>:	adds	r3, r1, r2
   0x10016224 <+22>:	movs	r0, r0
   0x10016226 <+24>:	movs	r0, r0
   0x10016228 <+26>:	subs	r2, #1
   0x1001622a <+28>:	bcs.n	0x1001622e <memmove+32>
   0x1001622c <+30>:	movs	r0, r0
   0x1001622e <+32>:	movs	r0, r0
   0x10016230 <+34>:	movs	r0, r0
   0x10016232 <+36>:	movs	r0, r0
End of assembler dump.
(gdb) info registers
r0             0x100001a8          268435880
r1             0x100001a8          268435880
r2             0xa0                160
r3             0x23                35
r4             0x3a                58
r5             0x20041f01          537140993
r6             0x18000000          402653184
r7             0x0                 0
r8             0xffffffff          -1
r9             0xffffffff          -1
r10            0xffffffff          -1
r11            0xffffffff          -1
r12            0x4001801c          1073840156
sp             0x20012de8          0x20012de8 <z_interrupt_stacks+2008>
lr             0x10015a0d          0x10015a0d <z_early_memcpy+6>
pc             0x10016216          0x10016216 <memmove+8>
xpsr           0x1000000           16777216
msp            0x20013f10          0x20013f10 <sys_work_q_stack>
psp            0x20012de8          0x20012de8 <z_interrupt_stacks+2008>
primask        0x1                 1 '\001'
basepri        0x0                 0 '\000'
faultmask      0x0                 0 '\000'
control        0x2                 2 '\002'

(gdb) x/64xh memmove
0x1001620e <memmove>:	0xb510	0x4288	0xd806	0x2300	0x429a	0xd008	0x5ccc	0x54c4
0x1001621e <memmove+16>:	0x3301	0xe7f9	0x188b	0x0000	0x0000	0x3a01	0xd200	0x0000
0x1001622e <memmove+32>:	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000
0x1001623e <memset+10>:	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000
0x1001624e <__file_str_put+10>:	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000
0x1001625e <putc+10>:	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000
0x1001626e <putc+26>:	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000
0x1001627e <__math_uflowf+2>:	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000

Environment (please complete the following information):

  • OS: Debian Linux
  • Toolchain: zephyr-sdk-0.16.8
  • Commit SHA or Version used: commit 8469084 (HEAD -> v4.0.0, tag: v4.0.0, origin/v4.0-branch)

Metadata

Metadata

Assignees

Labels

bugThe issue is a bug, or the PR is fixing a bugplatform: Raspberry Pi PicoRaspberry Pi Pico (RPi Pico)priority: lowLow impact/importance bug

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions