-
Notifications
You must be signed in to change notification settings - Fork 707
fix(cam_hal): throttle WARN spam so cam_task reduces it's stack footprint #761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix(cam_hal): throttle WARN spam so cam_task reduces it's stack footprint #761
Conversation
– prevents stack-overflow in cam_task The old cam_take() used recursion to retry if * no JPEG EOI was found or * a NULL frame pointer was returned (GDMA glitch on ESP32-S3). Under heavy loss conditions this could overflow the cam_task stack and reboot the whole system. * Re-wrote cam_take() as a single while-loop – stack depth is now constant and independent of retry count. * Added strict timeout tracking (`remaining = timeout - elapsed`); function can never block longer than the caller’s timeout. * ESP32-S3 only * capped GDMA reset storms to 3 attempts (`MAX_GDMA_RESETS`) * logs one “giving up” warning, then yields (`vTaskDelay(1)`) to avoid busy-spin after hardware is wedged. * Non-S3 targets * emit early `ESP_LOGW` when a NULL frame ever appears, then yield one tick per loop to prevent CPU thrash. * Maintained existing JPEG-EOI trimming and YUV→GRAY memcpy paths; behaviour unchanged on successful frames. * Inline comment links to esp32-camera commit 984999f / issue espressif#620 for future context.
I can't build now — getting this error:
I'm using ESPHome 2025.7.0-dev
|
Woops :D |
74e41d5
to
4092742
Compare
Background ---------- • Every bad frame produced an ESP_LOGW(): - “Unexpected NULL frame …” (non-S3) - “NO-EOI – JPEG end marker missing” • Each ESP_LOGW() invokes vsnprintf() once ⇒ ~300 B of stack for that call. What’s changed -------------- * First occurrence of each warning: ESP_DRAM_LOGW_ONCE() → zero stack after the first print. * Subsequent occurrences: count with a uint16_t and emit a **literal** ESP_LOGW() only every 100th event (~ 60 B stack usage). * Counters auto-reset at 10 000 to avoid wraparound. * On ESP32-S3 the “Giving up GDMA reset…” message is also moved to ESP_DRAM_LOGW_ONCE() to avoid stack usage. Effects ------- * Occasional diagnostics are still available: first warning and then a summary every 100 events.
4092742
to
c439f08
Compare
Sorry, my brain mixed up the name with the behavior, that it will only fire "once" ;) It should be fixed now under the tag I think I need a break :) |
//currently this is used only for YUV to GRAYSCALE | ||
|
||
if (warn_eoi_miss_cnt == 0) { | ||
ESP_DRAM_LOGW(TAG, "NO-EOI – JPEG end marker missing"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this use might give issues on older cores ESP_DRAM_LOGW
. You have correctly used ESP_LOGW
in the other cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which compatibility does this project need maintain?
Looks like it got worse — now I’m getting the old error too:
And something new as well:
|
Well, looks switching to literal messages wasn't really successful, it seems to still use 300 B of stack space, due to running through ESPHome’s logger helper. However, it now really catches on a different location, because I did not expect a logging flood from But in the end, it may just be a good decision to increase the stack space, what do you think @me-no-dev? |
@turenkomv I've reduced the log spam of cam_verify_jpeg_soi() in PR #760 and create a tag to include the additional changes: But I think we fixed probably most if not all "real" issues with the stack now. And just having the issue that all that log messages don't really fit in such a busy operation in the stack. Please try it anyway, I want to see what's crashing now, so we may be able to reduce the stack usage further. In the end, I think we need to configure just a larger stack usage. Overall, I think I have identified what the fundamental issue is, why your device isn't "working" right now. But I like to take the opportunity to make sure error handling is working better, instead of fixing the root cause why it's not fetching the images properly. Hope you understand. :) |
|
I increased the stack size using this config:
|
@turenkomv thanks! It's too warm to work right now, will look on these later. |
Description
Precondition: Merge of #758
Background
• Every bad frame produced an ESP_LOGW():
• Each ESP_LOGW() invokes vsnprintf() once ⇒ ~300 B of stack for that call.
What’s changed
ESP_DRAM_LOGW_ONCE() → zero stack after the first print.
count with a uint16_t and emit a literal ESP_LOGW() only
every 100th event (~ 60 B stack usage).
ESP_DRAM_LOGW_ONCE() to avoid stack usage.
Effects
summary every 100 events.
Related
Thanks to @turenkomv who reported this issue here.
Testing
@turenkomv would you be so kind and would test this on your hardware? Tag is
fixes_for_m5_0.4
, which is the currently used version in ESPHome with all four patches on top.Checklist
Before submitting a Pull Request, please ensure the following: