ESP32-S3 TWAI Driver Bug: No CAN Bus Transmission

by Alex Johnson 50 views

Introduction

In this comprehensive article, we delve into a critical issue encountered with the TWAI (Two-Wire Automotive Interface) driver on the ESP32-S3 microcontroller. Specifically, the problem revolves around the successful reporting of message transmissions by the driver (ESP_OK), while, in reality, these messages are not physically transmitted over the CAN bus. This article aims to provide a detailed analysis of the issue, the steps taken to reproduce it, and potential root causes, offering valuable insights for developers and engineers working with the ESP32-S3 and CAN communication.

Problem Description

The core issue lies in the discrepancy between the reported transmission status and the actual physical transmission of CAN messages. The twai_transmit() function, responsible for sending messages, completes its execution in an impossibly short time frame โ€“ often 0 milliseconds. This immediate completion is a red flag, as a genuine CAN transmission at a baud rate of 500 kbit/s should realistically take between 5 to 50 milliseconds. The absence of physical transmission has been consistently observed across various testing scenarios, making this a significant concern for applications relying on CAN communication via ESP32-S3.

This problem has been meticulously tested under various conditions to pinpoint the cause. Tests included using two different Arduino Nano ESP32 boards, both based on the ESP32-S3, ensuring the issue wasn't specific to a single device. Multiple SN65HVD230D CAN transceivers were employed to rule out transceiver-related problems. Furthermore, different GPIO pin combinations were tried to eliminate potential pin configuration conflicts. The issue persisted across a wide range of baud rates (500k, 125k, and 25k), and both NORMAL and NO_ACK modes were tested to cover different operational scenarios. Even different TWAI libraries, including the native API, ESP32-TWAI-CAN, and arduino-CAN, were used, but the underlying issue remained the same: messages were not physically transmitted despite the driver reporting success.

The consistent result across these tests โ€“ a 0ms transmission time indicative of no physical output โ€“ strongly suggests a fundamental problem within the ESP32-S3's TWAI driver implementation. This behavior undermines the reliability of CAN communication, which is crucial in many industrial and automotive applications.

Expected Behavior vs. Actual Behavior

Expected Behavior

When twai_transmit() is called with a valid CAN message, the following should occur:

  1. The function should take 5-50ms to complete, depending on the baud rate and bus arbitration.
  2. The CAN message should be physically transmitted on the CAN_H/CAN_L differential lines.
  3. A second ESP32 board (acting as a receiver) connected to the bus should successfully receive the message.
  4. Multiple consecutive transmissions should work reliably without errors.

Actual Behavior

The actual behavior observed deviates significantly from these expectations. The issue manifests differently depending on the TWAI mode:

NORMAL Mode (Two Boards Connected)

In NORMAL mode, with two ESP32-S3 boards connected via the CAN bus, the following pattern is observed:

[0] TX... OK (1ms)     โ† First message appears to work
[1] TX... OK (0ms)     โ† 0ms = NOT physically sent!
[2] TX... OK (0ms)
[3] TX... OK (0ms)
[4] TX... OK (0ms)
[5] TX... OK (0ms)
[6] TX... ERROR 263 (2000ms)  โ† ESP_ERR_TIMEOUT
[7] TX... ERROR 263 (2000ms)

The initial message transmission seems to complete with a reasonable duration (1ms). However, subsequent transmissions report a 0ms duration, indicating that they are not physically transmitted. After a few attempts, the system reports an ESP_ERR_TIMEOUT (error code 263), suggesting a failure to communicate within the expected time frame.

On the receiving end, the second ESP32 board fails to receive any messages, further confirming the lack of physical transmission on the CAN bus.

NO_ACK Mode (Single Board Loopback)

In NO_ACK mode, which is often used for testing and diagnosis, the observed behavior is as follows:

[0] OK (32 ยตs)
[1] OK (7 ยตs)
[2-15] OK (7-11 ยตs)    โ† Works for ~15 messages
[16] ERROR 259 (3 ยตs)  โ† ESP_ERR_INVALID_STATE
[17+] ERROR 259        โ† Continues failing

Initially, the transmissions appear to work, with response times in the microsecond range. However, after approximately 15 messages, the system encounters an ESP_ERR_INVALID_STATE (error code 259), indicating an unexpected state transition within the TWAI driver. Subsequent transmission attempts continue to fail with the same error.

Following the occurrence of error 259, an examination of the TWAI status reveals that the system has entered the TWAI_STATE_RECOVERING state. The transmit error counter is elevated, exceeding 128, and the bus error count is continuously increasing, signaling significant communication issues.

These distinct behaviors in NORMAL and NO_ACK modes highlight the complexity of the problem and suggest that the TWAI driver on ESP32-S3 may have issues with both standard bus communication and its error recovery mechanisms.

Steps to Reproduce the Issue

To reliably reproduce this issue, the following hardware setup and software code can be used:

Hardware Setup

The setup involves two Arduino Nano ESP32 boards (or any ESP32-S3 based boards) connected via a CAN bus, along with SN65HVD230 CAN transceivers.

Board 1 (Sender):

  • Arduino Nano ESP32
    • GPIO 3 โ†’ SN65HVD230 TX (Pin 1)
    • GPIO 2 โ†’ SN65HVD230 RX (Pin 4)
    • GPIO 13 โ†’ SN65HVD230 Rs (Pin 8) [set to LOW for high-speed]
      1. 3V โ†’ SN65HVD230 VCC
    • GND โ†’ SN65HVD230 GND

Board 2 (Receiver):

  • Same configuration as Board 1

CAN Bus Connection:

The CAN bus connection requires proper termination to ensure signal integrity. The following connections should be made:

Board 1 CAN_H (Pin 7) โ”€โ”€โ”ฌโ”€โ”€ 120ฮฉ โ”€โ”€โ”ฌโ”€โ”€ Board 2 CAN_H
Board 1 CAN_L (Pin 6) โ”€โ”€โ”ค          โ”œโ”€โ”€ Board 2 CAN_L
Board 1 GND โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€ Board 2 GND
  • Cable: 30cm twisted-pair, 4-wire
  • Termination: 120ฮฉ on both ends (measured ~60ฮฉ total)

Software - Sender Code

The following Arduino code can be used for the sender board:

#include "driver/twai.h"

#define CAN_TX 3
#define CAN_RX 2

void setup() {
  Serial.begin(115200);
  delay(2000);
  
  Serial.println("TWAI Sender Test");
  
  // Configure TWAI
  twai_general_config_t g_config = TWAI_GENERAL_CONFIG_DEFAULT(
    (gpio_num_t)CAN_TX, 
    (gpio_num_t)CAN_RX, 
    TWAI_MODE_NORMAL
  );
  
  twai_timing_config_t t_config = TWAI_TIMING_CONFIG_500KBITS();
  twai_filter_config_t f_config = TWAI_FILTER_CONFIG_ACCEPT_ALL();
  
  // Install and start
  if (twai_driver_install(&g_config, &t_config, &f_config) == ESP_OK) {
    Serial.println("Driver installed");
  } else {
    Serial.println("Driver install FAILED");
    while(1);
  }
  
  if (twai_start() == ESP_OK) {
    Serial.println("Driver started\n");
  } else {
    Serial.println("Driver start FAILED");
    while(1);
  }
}

void loop() {
  static uint32_t counter = 0;
  
  // Prepare message
  twai_message_t msg;
  msg.identifier = 0x101;
  msg.data_length_code = 5;
  msg.data[0] = 'H';
  msg.data[1] = 'E';
  msg.data[2] = 'L';
  msg.data[3] = 'L';
  msg.data[4] = 'O';
  msg.extd = 0;
  msg.rtr = 0;
  
  // Measure transmission time
  unsigned long start = millis();
  esp_err_t err = twai_transmit(&msg, pdMS_TO_TICKS(2000));
  unsigned long duration = millis() - start;
  
  // Report
  Serial.printf("[%lu] TX... ", counter);
  if (err == ESP_OK) {
    Serial.printf("OK (%lums)\n", duration);  // Shows 0ms after first message!
  } else {
    Serial.printf("ERROR %d (%lums)\n", err, duration);
  }
  
  counter++;
  delay(2000);
}

Software - Receiver Code

The following Arduino code can be used for the receiver board:

#include "driver/twai.h"

#define CAN_TX 3
#define CAN_RX 2

void setup() {
  Serial.begin(115200);
  delay(2000);
  
  Serial.println("TWAI Receiver Test");
  
  twai_general_config_t g_config = TWAI_GENERAL_CONFIG_DEFAULT(
    (gpio_num_t)CAN_TX, 
    (gpio_num_t)CAN_RX, 
    TWAI_MODE_NORMAL
  );
  
  twai_timing_config_t t_config = TWAI_TIMING_CONFIG_500KBITS();
  twai_filter_config_t f_config = TWAI_FILTER_CONFIG_ACCEPT_ALL();
  
  twai_driver_install(&g_config, &t_config, &f_config);
  twai_start();
  
  Serial.println("Waiting for messages...\n");
}

void loop() {
  twai_message_t msg;
  
  if (twai_receive(&msg, pdMS_TO_TICKS(100)) == ESP_OK) {
    Serial.println("=== MESSAGE RECEIVED ===");
    Serial.printf("ID: 0x%03X\n", msg.identifier);
    Serial.printf("DLC: %d\n", msg.data_length_code);
    Serial.print("Data: ");
    for (int i = 0; i < msg.data_length_code; i++) {
      Serial.printf("%02X ", msg.data[i]);
    }
    Serial.println("\n========================\n");
  }
}

By using this setup and code, the issue of messages not being transmitted on the CAN bus, despite the driver reporting success, can be reliably reproduced.

Debug Information: Comprehensive Testing and Results

To thoroughly investigate the TWAI driver issue on the ESP32-S3, a series of tests were conducted, each designed to isolate potential causes and narrow down the scope of the problem. These tests included varying GPIO pins, baud rates, and libraries, as well as swapping boards and examining transceiver configurations. The results of these tests provide valuable insights into the nature of the issue.

Test 1: Different GPIO Pins Tested

A common concern when working with microcontrollers is the possibility of conflicts or incorrect configurations related to GPIO pins. To address this, the TWAI communication was tested with different combinations of TX and RX pins on the ESP32-S3. The following table summarizes the results:

TX Pin RX Pin Result
GPIO 3 GPIO 2 0ms (no transmission)
GPIO 0 GPIO 1 0ms (no transmission)
GPIO 5 GPIO 6 0ms (no transmission)

The outcome was consistent across all pin combinations: the transmission time was reported as 0ms, and no physical transmission occurred on the CAN bus. This suggests that the issue is not specific to any particular GPIO pins, ruling out pin configuration as the primary cause.

Test 2: Different Baudrates Tested

The baud rate, or data transmission speed, is a critical parameter in CAN communication. To determine if the issue was related to a specific baud rate, tests were performed using a range of common baud rates. The results are shown below:

Baudrate Result
500 kbit/s 0ms transmission
125 kbit/s 0ms transmission
25 kbit/s 0ms transmission

Regardless of the baud rate used, the transmission time remained at 0ms, and no messages were physically transmitted. This indicates that the issue is not tied to the speed of data transmission on the CAN bus.

Test 3: Board Swap Test

To ensure that the problem was not due to a hardware defect in a specific ESP32-S3 board, the sender and receiver roles were swapped between the two boards. The software code was also switched accordingly. This test aimed to determine if the issue followed a particular board or remained consistent regardless of the board's role.

The result of the board swap test was that the same issue persisted on both boards, regardless of whether they were acting as the sender or receiver. This finding strongly suggests that the problem is not a hardware defect specific to an individual board but rather a more systemic issue related to the ESP32-S3 TWAI implementation.

Test 4: SN65HVD230 Rs-Pin Configuration

The SN65HVD230 CAN transceiver has an Rs (Slope Control) pin that can be used to optimize the transceiver's behavior for different bus speeds and conditions. To ensure that the transceiver was configured correctly, the Rs-Pin was explicitly set to LOW, which corresponds to the high-speed mode. The following code snippet was used:

pinMode(13, OUTPUT);
digitalWrite(13, LOW);  // Rs = LOW = High-Speed

Despite explicitly setting the Rs-Pin for high-speed operation, there was no change in the observed behavior. The transmission time remained at 0ms, indicating that the transceiver's configuration was not the root cause of the problem.

Test 5: Different Libraries Tested

To rule out the possibility of the issue being specific to a particular software library, the TWAI communication was tested using multiple libraries. This included the native TWAI API provided by Espressif, as well as third-party libraries designed to simplify CAN communication on the ESP32. The following libraries were tested:

  1. Native TWAI API (driver/twai.h): โŒ 0ms issue
  2. ESP32-TWAI-CAN library (handmade0octopus): โŒ Same 0ms issue
  3. arduino-CAN library (sandeepmistry): โŒ Doesn't compile for ESP32-S3

The results were consistent across the tested libraries: the 0ms transmission issue persisted. The arduino-CAN library, while widely used, does not compile for the ESP32-S3, limiting its applicability in this case.

The fact that the issue occurs across different libraries suggests that the problem lies within the underlying TWAI driver implementation, which is common to all of these libraries.

Conclusion from Debug Information

The comprehensive debugging process, involving various tests and configurations, has provided valuable insights into the TWAI driver issue on the ESP32-S3. The key findings are:

  • The issue is not specific to GPIO pins, baud rates, or individual hardware boards.
  • The SN65HVD230 transceiver configuration is not the root cause.
  • The problem persists across different TWAI libraries, indicating an issue with the underlying driver implementation.

These findings point towards a deeper problem within the ESP32-S3's TWAI peripheral or its driver, suggesting a potential bug in the transmission state machine, incorrect error handling, or GPIO matrix routing issues specific to the S3 variant.

Additional Context and Community Reports

Further evidence of the TWAI driver issue on the ESP32-S3 can be found in various community reports and forum discussions. These reports highlight the widespread nature of the problem and the challenges faced by developers attempting to use CAN communication on this platform.

Community Reports

Multiple forum posts and discussions confirm the existence of TWAI issues on the ESP32-S3. Here are a few examples:

  • https://github.com/espressif/arduino-esp32/discussions/9105: This discussion thread on the Arduino-ESP32 GitHub repository details similar problems with TWAI communication on the ESP32-S3.
  • Stack Overflow reports: Several Stack Overflow questions and answers document issues related to TWAI instability on the ESP32-S3.

These community reports corroborate the findings from the debugging process, indicating that the TWAI driver issue is not an isolated incident but a more widespread problem affecting many users.

Community Workaround

Due to the challenges with TWAI communication on the ESP32-S3, some community members have resorted to using alternative communication methods. A common workaround is to use Modbus TCP over WiFi instead of CAN. This approach leverages the ESP32-S3's WiFi capabilities to establish communication between devices, bypassing the problematic TWAI driver.

While this workaround provides a viable alternative for some applications, it may not be suitable for all scenarios. CAN communication is often preferred in industrial and automotive applications due to its robustness, real-time capabilities, and deterministic behavior. WiFi-based communication, while convenient, may not offer the same level of reliability and determinism.

Comparison with ESP32 Classic

An important observation is that the same code and hardware setup, when used on ESP32 Classic (non-S3) boards, work reliably. This highlights that the issue is specific to the ESP32-S3 TWAI implementation and not a general problem with CAN communication on the ESP32 platform.

This comparison further strengthens the hypothesis that there is a bug or issue within the ESP32-S3 TWAI driver or peripheral that is not present in the classic ESP32.

Suspected Root Cause: Unraveling the Mystery

Based on the observed behavior and the debugging information gathered, a few hypotheses can be formulated regarding the root cause of the TWAI driver issue on the ESP32-S3.

Key Observations

Before delving into the potential root causes, let's recap the key observations:

  1. twai_transmit() returns immediately (0ms) without waiting for physical transmission: This is the most prominent symptom, indicating that the driver is not properly waiting for the CAN message to be transmitted on the bus.
  2. TX error counter increases after ~15 messages in NO_ACK mode: The increasing transmit error counter suggests that the ESP32-S3 is encountering issues during the transmission process, even in the absence of acknowledgments.
  3. Bus enters RECOVERING state prematurely: The premature entry into the RECOVERING state indicates that the error handling mechanisms within the TWAI driver may be overly sensitive or incorrectly triggered.

Potential Root Causes

Considering these observations, the following potential root causes can be considered:

  1. Bug in the transmission state machine: The TWAI peripheral likely has a state machine that governs the transmission process. A bug in this state machine could cause the driver to prematurely exit the transmission sequence, resulting in the 0ms transmission time and the failure to physically transmit the message.
  2. Incorrect error handling causing premature bus-off: The TWAI driver incorporates error handling mechanisms to deal with bus errors and other communication issues. If these mechanisms are incorrectly implemented, they could lead to a premature bus-off condition, where the device stops transmitting messages altogether. This could explain the increasing TX error counter and the entry into the RECOVERING state.
  3. GPIO matrix routing issues specific to S3: The ESP32-S3 uses a GPIO matrix to map peripheral signals to physical pins. It is possible that there are issues with the GPIO matrix routing specific to the S3 variant, which could interfere with the TWAI communication. However, this is less likely given the consistent behavior across different GPIO pin combinations.

Hypothesis

The most plausible hypothesis is that the ESP32-S3 TWAI peripheral has a bug in its transmission state machine or incorrect error handling logic. This could cause the driver to return prematurely from the twai_transmit() function, without ensuring that the message has been physically transmitted on the CAN bus. The increasing TX error counter and premature bus-off condition could be a consequence of this underlying issue.

Workaround: An Alternative Path

In light of the challenges encountered with the TWAI driver on the ESP32-S3, a workaround has been implemented to maintain communication between devices. This workaround involves using Modbus TCP over WiFi as an alternative communication method.

Modbus TCP over WiFi

Modbus TCP is a widely used industrial communication protocol that operates over TCP/IP networks. By leveraging the ESP32-S3's built-in WiFi capabilities, Modbus TCP can be used to establish communication between devices without relying on the CAN bus.

This approach offers several advantages:

  • Bypasses the TWAI driver issue: By using WiFi-based communication, the problematic TWAI driver is circumvented, allowing for reliable data exchange between devices.
  • Leverages existing infrastructure: Many industrial environments already have WiFi networks in place, making it relatively easy to deploy Modbus TCP-based communication.
  • Flexibility and scalability: WiFi-based communication offers flexibility in terms of network topology and scalability, allowing for easy expansion of the communication network.

Limitations

However, it's important to acknowledge the limitations of this workaround:

  • Not suitable for all applications: CAN communication is often preferred in applications requiring real-time performance, deterministic behavior, and high reliability. WiFi-based communication may not offer the same level of performance and reliability in certain scenarios.
  • Increased complexity: Implementing Modbus TCP over WiFi adds complexity to the system design and requires additional software development effort.

Conclusion

While Modbus TCP over WiFi provides a viable workaround for the TWAI driver issue on the ESP32-S3, it's essential to carefully evaluate the requirements of the application and consider the limitations of this approach. For applications where CAN communication is critical, a proper fix to the TWAI driver is highly desirable.

Request for Assistance and Clarification

In light of the detailed analysis and debugging information presented in this article, a formal request for assistance and clarification is being made to the developers and maintainers of the ESP32-S3 platform. The following questions and requests are being put forth:

  1. Confirmation of the issue: Can you confirm that this is a known issue with the ESP32-S3 TWAI driver?
  2. Planned fix: Is there a planned fix for this issue in upcoming ESP-IDF or Arduino-ESP32 releases?
  3. Hardware revisions: Are there specific ESP32-S3 hardware revisions where TWAI works correctly?
  4. Recommendation for users: Should users avoid TWAI on ESP32-S3 and use alternatives (SPI CAN controllers, WiFi)?

Addressing these questions will provide valuable guidance to developers and engineers working with the ESP32-S3 and help them make informed decisions about their communication strategies.

Conclusion

This article has provided a comprehensive overview of a critical issue affecting the TWAI driver on the ESP32-S3 microcontroller. The issue, which involves messages not being physically transmitted on the CAN bus despite the driver reporting success, has been thoroughly analyzed, debugged, and documented.

The findings from this investigation highlight the need for a proper fix to the TWAI driver on the ESP32-S3. In the meantime, the Modbus TCP over WiFi workaround offers a viable alternative for some applications, but it's essential to carefully consider the limitations of this approach.

We hope that this article has provided valuable insights into the TWAI driver issue on the ESP32-S3 and will contribute to a resolution that enables reliable CAN communication on this platform.

For more in-depth information about CAN (Controller Area Network), you can visit this trusted website: CAN in Automation (CiA)