Skip to content

Conversation

dhalbert
Copy link
Collaborator

@dhalbert dhalbert commented Sep 5, 2025

#10027 (added in 9.2.5) considerably improved network performance on Pi Pico by using dynamic storage allocation in LWIP. However, it turns out LWIP can do heap operations during interrupts. Thus the operations need to be guarded in critical sections. Otherwise there are storage-related crashes at random intervals, and I saw storage assertion violations in gdb when a crash happened.

Tested on a Pico W, with a simple test program that did an HTTP fetch from a local server 10 times a second.

On a Pico 2 W, I was unable to cause the original problem before a fix, but the fix doesn't seem to break things either.

@gsexton, @anecdata, and/or @bablokb would you be willing to do some stress testing on this? I did a little other testing but I don't have many interesting test programs. Thanks. And @gsexton could you see if it fixes the original problem you saw?

@dhalbert dhalbert requested review from tannewt and anecdata September 5, 2025 19:16
Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me!

@dhalbert dhalbert merged commit a96f112 into adafruit:main Sep 6, 2025
162 checks passed
@dhalbert dhalbert deleted the rp2-lwip-storage-critical-section branch September 6, 2025 00:53
@bablokb
Copy link

bablokb commented Sep 6, 2025

@bablokb would you be willing to do some stress testing on this

I backported this to 9.2.x and tested it with my admin-webinterface on my systems. The browser opens nine parallel sockets at the same time. No problems and I cannot see any additional delays.

@anecdata
Copy link
Member

anecdata commented Sep 6, 2025

I ran 10.0.0-beta.3 vs. PR artifacts on a mix of Pico W and Pico 2W, rapidly sending TCP packets up to 1KB back and forth. The 10.0.0-beta.3 Pico W had an odd panic (below) after about an hour, but the PR artifacts have been running all night so I don't see any regressions.

Create TCP Client Socket
Connecting

*** PANIC ***

tcp_slowtmr: TIME-WAIT pcb->state == TIME-WAI
[01:00:58.400] Disconnected
[01:00:59.410] Warning: Could not open tty device (No such file or directory)
[01:00:59.410] Waiting for tty device..
[01:01:01.491] Connected
y access or instruction error.
Please file an issue with your program at github.com/adafruit/circuitpython/issues.
Press reset to exit safe mode.

@anecdata
Copy link
Member

anecdata commented Sep 6, 2025

The PR seems like a rational fix, so maybe completely unrelated, but under atypically extreme duress (hammering on TCP socket, and hammering on interrupts ...assuming the code is doing what I think), a TCP client will get hardfaults somewhat regularly with PR artifacts on Pico W:

Client code (on a Pico W):
# Adafruit CircuitPython 10.0.0-beta.3 on 2025-08-29; Raspberry Pi Pico W with rp2040
# Adafruit CircuitPython 10.0.0-beta.3-8-g9e7a03cbc2 on 2025-09-05; Raspberry Pi Pico W with rp2040
# "Hard fault: memory access or instruction error." (after "Connecting")

import time
import supervisor
import microcontroller
import os
import random
import traceback
import wifi
import socketpool
import array
import pulseio
import board

# edit host and port to match server
HOST = "192.168.6.57"
PORT = 5000
TIMEOUT = 1
INTERVAL = 0.1
MAXBUF = 8192

def send_pulses():
    # on off on ...
    MAX_TRAIN = 17
    for p_out in range(1, len(pulse_outs)):
        p_train = []
        for p_item in range(0, random.randint(3, MAX_TRAIN)):
            p_train.append(random.randint(100, 10_000))
        pulses = array.array('H', p_train)
        pulse_outs[p_out].send(pulses)

time.sleep(3)  # wait for serial after reset

print("Connecting to Wifi")
while not wifi.radio.connected:
    try:
        wifi.radio.connect(os.getenv('WIFI_SSID'), os.getenv('WIFI_PASSWORD'))
    except Exception as ex:
        traceback.print_exception(ex, ex, ex.__traceback__)
        time.sleep(1 + random.random())
        supervisor.reload()
        # microcontroller.reset()
pool = socketpool.SocketPool(wifi.radio)
print(wifi.radio.ipv4_address)

# 50% duty cycle at 38kHz.
pulse_outs = []
pulse_outs.append(pulseio.PulseOut(board.GP0, frequency=38000, duty_cycle=32768))
pulse_outs.append(pulseio.PulseOut(board.GP1, frequency=38000, duty_cycle=32768))
pulse_outs.append(pulseio.PulseOut(board.GP2, frequency=38000, duty_cycle=32768))
pulse_outs.append(pulseio.PulseOut(board.GP3, frequency=38000, duty_cycle=32768))

buf = bytearray(MAXBUF)
while True:
    try:
        print("Create TCP Client Socket")
        with pool.socket(pool.AF_INET, pool.SOCK_STREAM) as s:
            s.settimeout(TIMEOUT)

            print("Connecting")
            s.connect((HOST, PORT))

            payload = bytearray()
            for _ in range(random.randint(0, MAXBUF)):
                payload.append(random.choice((b'0123456789ABCDEF')))

            send_pulses()  #####

            size = s.send(payload)
            print("Sent", size, "bytes")

            send_pulses()  #####

            size = s.recv_into(buf)
            # occasionally on recv_into:
            # OSError: [Errno 104] ECONNRESET
            # OSError: [Errno 9] EBADF
            # OSError: [Errno 116] ETIMEDOUT

            print('Received', size, "bytes", buf[:size])
    except Exception as ex:
        print(f"⚠️", end=" ")
        traceback.print_exception(ex, ex, ex.__traceback__)

    time.sleep(INTERVAL)
Server code (on a Pico 2W):
import time
import microcontroller
import os
import random
import traceback
import wifi
import socketpool

HOST = ""
PORT = 5000
TIMEOUT = None
BACKLOG = 2
MAXBUF = 8192

time.sleep(3)  # wait for serial after reset

print("Connecting to Wifi")
while not wifi.radio.connected:
    try:
        wifi.radio.connect(os.getenv('WIFI_SSID'), os.getenv('WIFI_PASSWORD'))
    except Exception as ex:
        traceback.print_exception(ex, ex, ex.__traceback__)
        time.sleep(1)
        microcontroller.reset()
pool = socketpool.SocketPool(wifi.radio)
print(wifi.radio.ipv4_address)

print("Create TCP Server socket", (HOST, PORT))
with pool.socket(pool.AF_INET, pool.SOCK_STREAM) as s:
    s.setsockopt(pool.SOL_SOCKET, pool.SO_REUSEADDR, 1)  # 
    s.settimeout(TIMEOUT)

    s.bind((HOST, PORT))
    s.listen(BACKLOG)
    print("Listening")

    buf = bytearray(MAXBUF)
    while True:
        try:
            print("Accepting connections")
            conn, addr = s.accept()
            conn.settimeout(TIMEOUT)
            print("Accepted from", addr)

            size = conn.recv_into(buf, MAXBUF)
            print("Received", buf[:size], size, "bytes")

            conn.send(buf[:size])
            print("Sent", buf[:size], size, "bytes")

        except Exception as ex:
            print(f"⚠️", end=" ")
            traceback.print_exception(ex, ex, ex.__traceback__)
        finally:

            time.sleep(0.1)  # too short and client gets "OSError: [Errno 104] ECONNRESET" in recv

            conn.close()
With beta.3, hardfaults occur under less duress than above, but I hadn't actually replicated it with the prior testing in the previous comment, so just wanted to see how it could be triggered.

There also seems to be a situation where the server closing the connection immediately after sending the response can cause OSError: [Errno 104] ECONNRESET at the client (somehow before the client has read all of the incoming packet... some buffering issue?).

@gsexton
Copy link

gsexton commented Sep 7, 2025

@dhalbert I'm running my original complete code with 9.2.9. I'll report what I see.

@gsexton
Copy link

gsexton commented Sep 8, 2025

sadly, I'm still seeing:

Exception: [Errno 113] ECONNABORTED <class 'OSError'>

It ran for an hour with no errors. Then, errors started happening on every other request to the post. The GET to the same URL started erroring about 2 hours after startup. When errors started, it would fail every other request. Then, after about 16 hours, every

FWIW, I re-wrote this using the Arduino core in C/C++ and it didn't throw any errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PicoW [Errno 113] ECONNABORTED <class 'OSError'>
5 participants