Skip to content

Handling of keep alive failed. after serval iteration. #63

@NightSkySK

Description

@NightSkySK

Describe the bug
I'm tring to understand nature of the failure and how to fix it for 2 weeks and I can't find the right solution.

After number of itteration I get the status MQTTKeepAliveTimeout

E (449803) coreMQTT: Handling of keep alive failed. Status=MQTTKeepAliveTimeout
E (449803) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTKeepAliveTimeout
E (449803) coreMQTT: MQTT operation failed with status MQTTKeepAliveTimeout

Some improvement I get rising the priority up to 6 for CONFIG_GRI_MQTT_AGENT_TASK_PRIORITY

I've also played with some other settings without major change.
I've traced task stack_size to make sure I'm not exceed there, but it seems to be fine.

I've went trough Open and Closed issues in this repo but I've not found any hint that someone faced similar issue to mine.

System information

  • Hardware board: ESP32
  • IDE used: 4.3
  • Operating System: Windows

Expected behavior
Keep MQTT connected for few hours not only for few minutes

Steps to reproduce bug
Example:
1. "I am using project [ ... ], and have configured with [ ... ]"
2. "When run on [ ... ], I observed that [ ... ]"

Console trace

I (431533) sub_pub_unsub_demo: Task "SubPub0" sending subscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 115
W (431913) ota_over_mqtt: Stack size uxOTATask: 888

I (431973) sub_pub_unsub_demo: Subscribe 115 for topic filter /filter/SubPub0 succeeded for task "SubPub0".
I (431973) sub_pub_unsub_demo: Task "SubPub0" sending publish request to coreMQTT-Agent with message "SubPub0" on topic "/filter/SubPub0" with ID 116.   
I (431983) sub_pub_unsub_demo: Task "SubPub0" waiting for publish 116 to complete.
W (432913) ota_over_mqtt: Stack size uxOTATask: 888

I (432973) coreMQTT: Publishing message to /filter/SubPub0.

W (433913) ota_over_mqtt: Stack size uxOTATask: 888

I (434203) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (434203) coreMQTT: State record updated. New state=MQTTPublishDone.
I (434203) sub_pub_unsub_demo: Publish 116 succeeded for task "SubPub0".
I (434203) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (434223) coreMQTT: State record updated. New state=MQTTPubAckSend.
I (434223) sub_pub_unsub_demo: Task "SubPub0" received: SubPub0
I (434233) sub_pub_unsub_demo: Task "SubPub0" sending unsubscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 117
W (434913) ota_over_mqtt: Stack size uxOTATask: 888

W (435913) ota_over_mqtt: Stack size uxOTATask: 888

I (436353) sub_pub_unsub_demo: Unsubscribe 117 for topic filter /filter/SubPub0 succeeded for task "SubPub0".
I (436353) sub_pub_unsub_demo: Task "SubPub0" completed a loop. Delaying before next loop.
W (436363) sub_pub_unsub_demo: Stack size uxSubPubTask: 1060

W (436913) ota_over_mqtt: Stack size uxOTATask: 888

W (437913) ota_over_mqtt: Stack size uxOTATask: 888

W (438913) ota_over_mqtt: Stack size uxOTATask: 888

W (439913) ota_over_mqtt: Stack size uxOTATask: 888

W (440913) ota_over_mqtt: Stack size uxOTATask: 888

I (441363) sub_pub_unsub_demo: Task "SubPub0" sending subscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 118
I (441803) sub_pub_unsub_demo: Subscribe 118 for topic filter /filter/SubPub0 succeeded for task "SubPub0".
I (441803) sub_pub_unsub_demo: Task "SubPub0" sending publish request to coreMQTT-Agent with message "SubPub0" on topic "/filter/SubPub0" with ID 119.   
I (441813) sub_pub_unsub_demo: Task "SubPub0" waiting for publish 119 to complete.
W (441913) ota_over_mqtt: Stack size uxOTATask: 888

I (442803) coreMQTT: Publishing message to /filter/SubPub0.

W (442913) ota_over_mqtt: Stack size uxOTATask: 888

W (443913) ota_over_mqtt: Stack size uxOTATask: 888

W (444913) ota_over_mqtt: Stack size uxOTATask: 888

W (445913) ota_over_mqtt: Stack size uxOTATask: 888

W (446913) ota_over_mqtt: Stack size uxOTATask: 888

W (447913) ota_over_mqtt: Stack size uxOTATask: 888

W (448913) ota_over_mqtt: Stack size uxOTATask: 888

E (449803) coreMQTT: Handling of keep alive failed. Status=MQTTKeepAliveTimeout
E (449803) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTKeepAliveTimeout
E (449803) coreMQTT: MQTT operation failed with status MQTTKeepAliveTimeout

I (449813) sub_pub_unsub_demo: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
I (449813) core_mqtt_agent_manager: TLS connection was disconnected.
I (449823) ota_over_mqtt: coreMQTT-Agent disconnected. Suspending OTA agent.
I (449833) core_mqtt_agent_manager: coreMQTT-Agent disconnected.
W (449913) ota_over_mqtt: Stack size uxOTATask: 888

W (450913) ota_over_mqtt: Stack size uxOTATask: 888

W (451913) ota_over_mqtt: Stack size uxOTATask: 888

W (452243) esp-tls: Failed to open new connection in specified timeout
I (452563) core_mqtt_agent_manager: Retry attempt 1.
W (452913) ota_over_mqtt: Stack size uxOTATask: 888

W (453913) ota_over_mqtt: Stack size uxOTATask: 888

W (454913) ota_over_mqtt: Stack size uxOTATask: 888

E (455613) coreMQTT: CONNACK recv failed with status = MQTTNoDataAvailable.
E (455613) coreMQTT: MQTT connection failed with status = MQTTNoDataAvailable.
I (455613) core_mqtt_agent_manager: Session present: 0

E (455623) core_mqtt_agent_manager: MQTT_Status: MQTTNoDataAvailable
W (455913) ota_over_mqtt: Stack size uxOTATask: 888

I (455943) core_mqtt_agent_manager: Retry attempt 2.
W (456913) ota_over_mqtt: Stack size uxOTATask: 888

I (456913) ota_over_mqtt: OTA agent resumed by timer.
W (457913) ota_over_mqtt: Stack size uxOTATask: 888

I (458463) coreMQTT: MQTT connection established with the broker.
I (458463) core_mqtt_agent_manager: Session present: 1

E (458463) coreMQTT: Collision when adding PacketID=132 at index=0.
I (458473) sub_pub_unsub_demo: coreMQTT-Agent connected.
I (458483) ota_over_mqtt: coreMQTT-Agent connected. Resuming OTA agent.
I (458483) core_mqtt_agent_manager: coreMQTT-Agent connected.
W (458913) ota_over_mqtt: Stack size uxOTATask: 888

I (458913) AWS_OTA: otaPal_GetPlatformImageState
I (458913) esp_ota_ops: aws_esp_ota_get_boot_flags: 1
I (458913) esp_ota_ops: [0] aflags/seq:0x2/0x1, pflags/seq:0xffffffff/0x0
I (458923) AWS_OTA: Current State=[RequestingJob], Event=[Resume], New state=[RequestingJob]
I (458943) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (458943) coreMQTT: State record updated. New state=MQTTPubAckSend.
W (458943) sub_pub_unsub_demo: Error or timed out waiting for ack for publish message 119. Re-attempting publish.
I (458953) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (458963) sub_pub_unsub_demo: Task "SubPub0" sending publish request to coreMQTT-Agent with message "SubPub0" on topic "/filter/SubPub0" with ID 119.   
I (458963) coreMQTT: State record updated. New state=MQTTPublishDone.
I (458983) sub_pub_unsub_demo: Task "SubPub0" waiting for publish 119 to complete.
I (458983) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (459003) sub_pub_unsub_demo: Publish 119 succeeded for task "SubPub0".
I (459003) coreMQTT: State record updated. New state=MQTTPubAckSend.
I (459023) sub_pub_unsub_demo: Task "SubPub0" received: SubPub0
I (459023) sub_pub_unsub_demo: Task "SubPub0" sending unsubscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 120
I (459313) ota_over_mqtt: Subscribed to topic $aws/things/GA913F0001/jobs/notify-next.


I (459313) AWS_OTA: Subscribed to MQTT topic: $aws/things/GA913F0001/jobs/notify-next
W (459923) ota_over_mqtt: Stack size uxOTATask: 888

Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x4000c250  PS      : 0x00060730  A0      : 0x8018ae5c  A1      : 0x3ffd75e0
A2      : 0x00000001  A3      : 0x00000000  A4      : 0x00007869  A5      : 0x3f42b34c  
A6      : 0x00000008  A7      : 0xff000000  A8      : 0x00000000  A9      : 0x3ffd75b0
A10     : 0x00000000  A11     : 0x3f408be0  A12     : 0x3ffd7834  A13     : 0x3ffb6d1c
A14     : 0x00000000  A15     : 0x00000001  SAR     : 0x00000004  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000001  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xfffffffd

Backtrace:0x4000c24d:0x3ffd75e0 0x4018ae59:0x3ffd7600 0x4018c4f5:0x3ffd7910 0x4019bc85:0x3ffd7940 0x40096801:0x3ffd7970 0x401062e8:0x3ffd79c0 0x40105e20:0x3ffd79f0 0x40106068:0x3ffd7a40 0x400d94f2:0x3ffd7a70 0x400945b1:0x3ffd7a90
0x4018ae59: _vfprintf_r at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/vfprintf.c:1521

0x4018c4f5: vprintf at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/vprintf.c:34 (discriminator 
5)

0x4019bc85: esp_log_writev at C:/Espressif/frameworks/esp-idf-v4.3.4/components/log/log.c:189

0x40096801: esp_log_write at C:/Espressif/frameworks/esp-idf-v4.3.4/components/log/log.c:199

0x401062e8: MQTTAgentCommand_Publish at C:\mSpaceLock\mSpaceGate-fmw\build/../components/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent_command_functions.c:79 (discriminator 13)

0x40105e20: processCommand at C:\mSpaceLock\mSpaceGate-fmw\build/../components/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:584

0x40106068: MQTTAgent_CommandLoop at C:\mSpaceLock\mSpaceGate-fmw\build/../components/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:1059

0x400d94f2: prvMQTTAgentTask at C:\mSpaceLock\mSpaceGate-fmw\build/../main/networking/mqtt/core_mqtt_agent_manager.c:519

0x400945b1: vPortTaskWrapper at C:/Espressif/frameworks/esp-idf-v4.3.4/components/freertos/port/xtensa/port.c:168



ELF file SHA256: ece89f91691638ee

Rebooting...
I (169237) sub_pub_unsub_demo: Task "SubPub0" sending subscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 37
I (169677) sub_pub_unsub_demo: Subscribe 37 for topic filter /filter/SubPub0 succeeded for task "SubPub0".
I (169687) sub_pub_unsub_demo: Task "SubPub0" sending publish request to coreMQTT-Agent with message "SubPub0" on topic "/filter/SubPub0" with ID 38.    
I (169697) sub_pub_unsub_demo: Task "SubPub0" waiting for publish 38 to complete.
W (169837) ota_over_mqtt: Stack size uxOTATask: 892

I (170677) coreMQTT: Publishing message to /filter/SubPub0.

W (170837) ota_over_mqtt: Stack size uxOTATask: 892

W (171837) ota_over_mqtt: Stack size uxOTATask: 892

W (172837) ota_over_mqtt: Stack size uxOTATask: 892

W (173837) ota_over_mqtt: Stack size uxOTATask: 892

W (174837) ota_over_mqtt: Stack size uxOTATask: 892

E (175687) coreMQTT: Handling of keep alive failed. Status=MQTTKeepAliveTimeout
E (175687) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTKeepAliveTimeout
E (175687) coreMQTT: MQTT operation failed with status MQTTKeepAliveTimeout

I (17

Backtrace:0x400D9533:0x3FFD7F60 0x400945B1:0x3FFD7F90
0x400d9533: prvMQTTAgentTask at C:\mSpaceLock\mSpaceGate-fmw\build/../main/networking/mqtt/core_mqtt_agent_manager.c:536

0x400945b1: vPortTaskWrapper at C:/Espressif/frameworks/esp-idf-v4.3.4/components/freertos/port/xtensa/port.c:168


5697) sub_pub_unsub_demo: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
W (175707) core_mqtt_agent_manager: ../main/networking/mqtt/core_mqtt_agent_manager.c 537

I (175717) ota_over_mqtt: coreMQTT-Agent disconnected. Suspending OTA agent.
I (175707) core_mqtt_agent_manager: TLS connection was disconnected.
I (175727) core_mqtt_agent_manager: coreMQTT-Agent disconnected.
W (175837) ota_over_mqtt: Stack size uxOTATask: 892

W (175837) AWS_OTA: OTA Timer handle NULL for Timerid=1, can't stop.
I (175837) AWS_OTA: OTA Agent is suspended.
I (175837) AWS_OTA: Current State=[Suspended], Event=[Suspend], New state=[Suspended]
W (177837) ota_over_mqtt: Stack size uxOTATask: 892

W (178437) esp-tls: Failed to open new connection in specified timeout
I (178837) core_mqtt_agent_manager: Retry attempt 1.
W (178837) ota_over_mqtt: Stack size uxOTATask: 892

W (179837) ota_over_mqtt: Stack size uxOTATask: 892

W (180837) ota_over_mqtt: Stack size uxOTATask: 892

W (181837) ota_over_mqtt: Stack size uxOTATask: 892

I (182787) coreMQTT: MQTT connection established with the broker.
I (182787) core_mqtt_agent_manager: Session present: 1

E (182787) coreMQTT: Collision when adding PacketID=43 at index=0.
I (182807) sub_pub_unsub_demo: coreMQTT-Agent connected.
I (182807) ota_over_mqtt: coreMQTT-Agent connected. Resuming OTA agent.
I (182807) core_mqtt_agent_manager: coreMQTT-Agent connected.
W (182837) ota_over_mqtt: Stack size uxOTATask: 892

I (182837) AWS_OTA: otaPal_GetPlatformImageState
I (182837) esp_ota_ops: aws_esp_ota_get_boot_flags: 1
I (182837) esp_ota_ops: [0] aflags/seq:0x2/0x1, pflags/seq:0xffffffff/0x0
I (182847) AWS_OTA: Current State=[RequestingJob], Event=[Resume], New state=[RequestingJob]
I (182867) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (182867) coreMQTT: State record updated. New state=MQTTPubAckSend.
W (182867) sub_pub_unsub_demo: Error or timed out waiting for ack for publish message 38. Re-attempting publish.
I (182877) sub_pub_unsub_demo: Task "SubPub0" sending publish request to coreMQTT-Agent with message "SubPub0" on topic "/filter/SubPub0" with ID 38.
I (182897) sub_pub_unsub_demo: Task "SubPub0" waiting for publish 38 to complete.
I (182997) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (182997) coreMQTT: State record updated. New state=MQTTPublishDone.
I (182997) sub_pub_unsub_demo: Publish 38 succeeded for task "SubPub0".
I (182997) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (183017) coreMQTT: State record updated. New state=MQTTPubAckSend.
I (183017) sub_pub_unsub_demo: Task "SubPub0" received: SubPub0
I (183027) sub_pub_unsub_demo: Task "SubPub0" sending unsubscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 39
I (183197) ota_over_mqtt: Subscribed to topic $aws/things/GA913F0001/jobs/notify-next.


I (183197) AWS_OTA: Subscribed to MQTT topic: $aws/things/GA913F0001/jobs/notify-next
W (183847) ota_over_mqtt: Stack size uxOTATask: 892

Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x4000c250  PS      : 0x00060130  A0      : 0x8018ae84  A1      : 0x3ffd7ad0
A2      : 0x00000001  A3      : 0x00000000  A4      : 0x00007869  A5      : 0x3f42b39c  
A6      : 0x00000008  A7      : 0xff000000  A8      : 0x00000000  A9      : 0x3ffd7aa0  
A10     : 0x00000000  A11     : 0x3f408c34  A12     : 0x800819de  A13     : 0x3ffbea50
A14     : 0x00000000  A15     : 0x00000001  SAR     : 0x00000004  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000001  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xfffffffd

Backtrace:0x4000c24d:0x3ffd7ad0 0x4018ae81:0x3ffd7af0 0x4018c51d:0x3ffd7e00 0x4019bcad:0x3ffd7e30 0x40096801:0x3ffd7e60 0x40106310:0x3ffd7eb0 0x40105e48:0x3ffd7ee0 0x40106090:0x3ffd7f30 0x400d94fa:0x3ffd7f60 0x400945b1:0x3ffd7f90
0x4018ae81: _vfprintf_r at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/vfprintf.c:1521

0x4018c51d: vprintf at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/vprintf.c:34 (discriminator 
5)

0x4019bcad: esp_log_writev at C:/Espressif/frameworks/esp-idf-v4.3.4/components/log/log.c:189

0x40096801: esp_log_write at C:/Espressif/frameworks/esp-idf-v4.3.4/components/log/log.c:199

0x40106310: MQTTAgentCommand_Publish at C:\mSpaceLock\mSpaceGate-fmw\build/../components/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent_command_functions.c:79 (discriminator 13)

0x40105e48: processCommand at C:\mSpaceLock\mSpaceGate-fmw\build/../components/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:584

0x40106090: MQTTAgent_CommandLoop at C:\mSpaceLock\mSpaceGate-fmw\build/../components/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:1059

0x400d94fa: prvMQTTAgentTask at C:\mSpaceLock\mSpaceGate-fmw\build/../main/networking/mqtt/core_mqtt_agent_manager.c:520

0x400945b1: vPortTaskWrapper at C:/Espressif/frameworks/esp-idf-v4.3.4/components/freertos/port/xtensa/port.c:168



ELF file SHA256: 5316e8bbe4264236

Rebooting...

Additional context
As a side question I wonder why in this example we start demo tasks before starting WiFi and coreMQTT-Agent network manager
I found another example where it's done other way around like github.com/aws-samples/device-software-for-aws-connected-mobility-solution What are the pros and cons of those two approach?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions