-
Notifications
You must be signed in to change notification settings - Fork 1.3k
RP2xxx: put LWIP storage ops in a critical section #10615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RP2xxx: put LWIP storage ops in a critical section #10615
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me!
I backported this to 9.2.x and tested it with my admin-webinterface on my systems. The browser opens nine parallel sockets at the same time. No problems and I cannot see any additional delays. |
I ran 10.0.0-beta.3 vs. PR artifacts on a mix of Pico W and Pico 2W, rapidly sending TCP packets up to 1KB back and forth. The 10.0.0-beta.3 Pico W had an odd panic (below) after about an hour, but the PR artifacts have been running all night so I don't see any regressions.
|
The PR seems like a rational fix, so maybe completely unrelated, but under atypically extreme duress (hammering on TCP socket, and hammering on interrupts ...assuming the code is doing what I think), a TCP client will get hardfaults somewhat regularly with PR artifacts on Pico W: Client code (on a Pico W):# Adafruit CircuitPython 10.0.0-beta.3 on 2025-08-29; Raspberry Pi Pico W with rp2040
# Adafruit CircuitPython 10.0.0-beta.3-8-g9e7a03cbc2 on 2025-09-05; Raspberry Pi Pico W with rp2040
# "Hard fault: memory access or instruction error." (after "Connecting")
import time
import supervisor
import microcontroller
import os
import random
import traceback
import wifi
import socketpool
import array
import pulseio
import board
# edit host and port to match server
HOST = "192.168.6.57"
PORT = 5000
TIMEOUT = 1
INTERVAL = 0.1
MAXBUF = 8192
def send_pulses():
# on off on ...
MAX_TRAIN = 17
for p_out in range(1, len(pulse_outs)):
p_train = []
for p_item in range(0, random.randint(3, MAX_TRAIN)):
p_train.append(random.randint(100, 10_000))
pulses = array.array('H', p_train)
pulse_outs[p_out].send(pulses)
time.sleep(3) # wait for serial after reset
print("Connecting to Wifi")
while not wifi.radio.connected:
try:
wifi.radio.connect(os.getenv('WIFI_SSID'), os.getenv('WIFI_PASSWORD'))
except Exception as ex:
traceback.print_exception(ex, ex, ex.__traceback__)
time.sleep(1 + random.random())
supervisor.reload()
# microcontroller.reset()
pool = socketpool.SocketPool(wifi.radio)
print(wifi.radio.ipv4_address)
# 50% duty cycle at 38kHz.
pulse_outs = []
pulse_outs.append(pulseio.PulseOut(board.GP0, frequency=38000, duty_cycle=32768))
pulse_outs.append(pulseio.PulseOut(board.GP1, frequency=38000, duty_cycle=32768))
pulse_outs.append(pulseio.PulseOut(board.GP2, frequency=38000, duty_cycle=32768))
pulse_outs.append(pulseio.PulseOut(board.GP3, frequency=38000, duty_cycle=32768))
buf = bytearray(MAXBUF)
while True:
try:
print("Create TCP Client Socket")
with pool.socket(pool.AF_INET, pool.SOCK_STREAM) as s:
s.settimeout(TIMEOUT)
print("Connecting")
s.connect((HOST, PORT))
payload = bytearray()
for _ in range(random.randint(0, MAXBUF)):
payload.append(random.choice((b'0123456789ABCDEF')))
send_pulses() #####
size = s.send(payload)
print("Sent", size, "bytes")
send_pulses() #####
size = s.recv_into(buf)
# occasionally on recv_into:
# OSError: [Errno 104] ECONNRESET
# OSError: [Errno 9] EBADF
# OSError: [Errno 116] ETIMEDOUT
print('Received', size, "bytes", buf[:size])
except Exception as ex:
print(f"⚠️", end=" ")
traceback.print_exception(ex, ex, ex.__traceback__)
time.sleep(INTERVAL) Server code (on a Pico 2W):import time
import microcontroller
import os
import random
import traceback
import wifi
import socketpool
HOST = ""
PORT = 5000
TIMEOUT = None
BACKLOG = 2
MAXBUF = 8192
time.sleep(3) # wait for serial after reset
print("Connecting to Wifi")
while not wifi.radio.connected:
try:
wifi.radio.connect(os.getenv('WIFI_SSID'), os.getenv('WIFI_PASSWORD'))
except Exception as ex:
traceback.print_exception(ex, ex, ex.__traceback__)
time.sleep(1)
microcontroller.reset()
pool = socketpool.SocketPool(wifi.radio)
print(wifi.radio.ipv4_address)
print("Create TCP Server socket", (HOST, PORT))
with pool.socket(pool.AF_INET, pool.SOCK_STREAM) as s:
s.setsockopt(pool.SOL_SOCKET, pool.SO_REUSEADDR, 1) #
s.settimeout(TIMEOUT)
s.bind((HOST, PORT))
s.listen(BACKLOG)
print("Listening")
buf = bytearray(MAXBUF)
while True:
try:
print("Accepting connections")
conn, addr = s.accept()
conn.settimeout(TIMEOUT)
print("Accepted from", addr)
size = conn.recv_into(buf, MAXBUF)
print("Received", buf[:size], size, "bytes")
conn.send(buf[:size])
print("Sent", buf[:size], size, "bytes")
except Exception as ex:
print(f"⚠️", end=" ")
traceback.print_exception(ex, ex, ex.__traceback__)
finally:
time.sleep(0.1) # too short and client gets "OSError: [Errno 104] ECONNRESET" in recv
conn.close() There also seems to be a situation where the server closing the connection immediately after sending the response can cause |
@dhalbert I'm running my original complete code with 9.2.9. I'll report what I see. |
sadly, I'm still seeing: Exception: [Errno 113] ECONNABORTED <class 'OSError'> It ran for an hour with no errors. Then, errors started happening on every other request to the post. The GET to the same URL started erroring about 2 hours after startup. When errors started, it would fail every other request. Then, after about 16 hours, every FWIW, I re-wrote this using the Arduino core in C/C++ and it didn't throw any errors. |
#10027 (added in 9.2.5) considerably improved network performance on Pi Pico by using dynamic storage allocation in LWIP. However, it turns out LWIP can do heap operations during interrupts. Thus the operations need to be guarded in critical sections. Otherwise there are storage-related crashes at random intervals, and I saw storage assertion violations in gdb when a crash happened.
Tested on a Pico W, with a simple test program that did an HTTP fetch from a local server 10 times a second.
On a Pico 2 W, I was unable to cause the original problem before a fix, but the fix doesn't seem to break things either.
@gsexton, @anecdata, and/or @bablokb would you be willing to do some stress testing on this? I did a little other testing but I don't have many interesting test programs. Thanks. And @gsexton could you see if it fixes the original problem you saw?