SRS V7.0.128: Bug In SRT Server EOF And Timeout Handling
Introduction
This article delves into a significant bug identified in SRS (Simple Realtime Server) version 7.0.128, specifically concerning the SRT (Secure Reliable Transport) server's handling of EOF (End-of-File) notifications and timeout configurations. This issue can lead to unexpected behavior in streaming applications, particularly when dealing with connection closures and idle streams. Understanding the nuances of this bug is crucial for developers and users who rely on SRT for robust and reliable media streaming.
In the realm of real-time media streaming, SRT stands out as a protocol engineered for resilience and security. It ensures reliable transport of high-quality video across unpredictable networks. However, even the most robust systems can harbor subtle bugs, and this article addresses a critical one discovered in SRS version 7.0.128. This bug manifests in two primary ways: the SRT server's failure to notify clients about EOF connections and its inability to adhere to configured timeout settings. This can lead to issues such as clients not properly terminating streams and resources being held longer than necessary, impacting overall system performance. Let's examine the details of this bug and explore its implications for your streaming applications.
The Bug: EOF Notification and Timeout Issues
The core of the issue lies in how the SRS SRT server manages connection terminations and idle stream handling. When a publishing client abruptly closes a connection, the server should ideally notify all subscribing clients about the EOF condition. This notification allows clients to gracefully terminate their streams and release resources. However, in version 7.0.128, this EOF notification mechanism appears to be faulty. Furthermore, the server's timeout configuration, designed to automatically close idle connections, doesn't function as expected. This means that clients might remain connected indefinitely, even when no data is being transmitted, leading to resource wastage and potential stability issues. This discrepancy between the expected behavior and the actual implementation can create significant challenges in managing SRT streams, especially in scenarios where connections are frequently established and terminated.
Scenario and Reproduction Steps
The bug was initially reported with a specific setup involving FFmpeg for publishing an SRT stream and srt-live-transmit for consuming it. The server was configured with specific SRT settings, including latency, buffer sizes, and timeout values. When the publishing client (FFmpeg) was terminated prematurely, the consuming client (srt-live-transmit) failed to recognize the connection closure and continued to wait for data, far exceeding the configured timeout periods. This scenario highlights the practical implications of the bug, where a seemingly straightforward connection termination leads to an unresponsive client application. By understanding these reproduction steps, developers and users can effectively test their own setups and verify whether they are affected by this issue. Let's delve into the detailed configuration and command-line examples that can be used to reproduce this bug.
To reproduce the issue, consider the following scenario:
-
SRT Server Configuration: The SRS server is configured with the following SRT settings:
srt_server { enabled on; listen 1935; maxbw -1; mss 1456; latency 120; recvlatency 120; peerlatency 120; tlpktdrop on; sendbuf 4194304; recvbuf 4194304; peer_idle_timeout 8000; connect_timeout 5000; tsbpdmode on; default_app live; }These settings define the behavior of the SRT server, including buffer sizes, latency parameters, and crucial timeout values like
peer_idle_timeout. Thepeer_idle_timeoutsetting, in particular, is expected to automatically close connections that have been idle for a specified duration. However, as we will see, this setting does not function as anticipated in the presence of this bug. -
Publishing Stream with FFmpeg: FFmpeg is used to publish an SRT stream to the server:
ffmpeg -re -i test.ts -c copy -f mpegts 'srt://127.0.0.1:1935?streamid=#!::h=/live/test,m=publish'This command instructs FFmpeg to read the
test.tsfile and stream it to the SRS server using the SRT protocol. Thestreamidparameter specifies the stream's endpoint on the server. This publishing step sets the stage for the bug to manifest when the connection is closed. -
Consuming Stream with
srt-live-transmit: Thesrt-live-transmittool is used to consume the stream from the server:srt-live-transmit -nokm 'srt://127.0.0.1:1935?streamid=#!::h=live/test,m=request&recv_timeout=3000&idle_timeout=3000&send_timeout=3000' file://con:stdout > /dev/nullThis command configures
srt-live-transmitto connect to the SRT server and receive the stream. The crucial part here is the inclusion ofrecv_timeoutandidle_timeoutparameters, which are intended to set the maximum time the client will wait for data before closing the connection. These timeouts are at the heart of the bug, as they are not properly honored by the SRS server. -
Reproducing the Bug: Start both the FFmpeg publishing process and the
srt-live-transmitconsuming process. After approximately two minutes, terminate the FFmpeg process (the publishing side). The expected behavior is thatsrt-live-transmitshould detect the connection closure and exit within the timeout period (3 seconds, as configured). However, in the presence of the bug,srt-live-transmitwill continue to wait indefinitely, far exceeding the configured timeouts. This demonstrates the failure of the SRT server to properly handle connection terminations and enforce timeout settings.
Observed Behavior and Logs
When the publishing side connection is closed, the consuming client (srt-live-transmit) does not honor the configured SRT timeouts. It waits indefinitely, failing to recognize the connection termination. This behavior contradicts the expected functionality, where the client should have closed the connection after the specified timeout period.
In the server logs, a specific message indicates a potential issue:
[2025-11-25 06:39:37.203][ERROR][1283004][8372j981][62][SRT] /home/ubuntu/srs/repo/trunk/objs/Platform-SRS7-Linux-6.8.0-GCC13.3.0-x86_64/srt-1-fit/srtcore/api.cpp:2638(epoll_remove_usock) # : remove_usock: @606932821 not found as either socket or group. Removing only from epoll system.(Timer expired)
This log message, while indicating a timer expiration, is misleading. It suggests that the timeout was honored, but the client's continued waiting demonstrates that this was not the case. The message is printed only after the client is forcefully terminated (e.g., via Ctrl-C), further highlighting the discrepancy between the log output and the actual behavior.
Root Cause Analysis
The root cause of this bug appears to stem from two primary issues:
-
Missing EOF Notification: The SRS SRT server fails to propagate an EOF notification to clients when the publishing side connection is closed. This lack of notification leaves the client unaware of the connection termination and prevents it from initiating a graceful shutdown.
-
Timeout Handling Failure: The server does not properly enforce the configured SRT timeout settings (both client-side and server-side). This means that even if timeouts are set, the server does not actively monitor the connection for idleness or receive inactivity, leading to connections remaining open indefinitely.
These two issues compound each other, resulting in the observed behavior. The absence of an EOF notification, combined with the failure to honor timeouts, creates a situation where clients are left in a perpetual waiting state, consuming resources without any active data transmission. This can have significant implications for the scalability and stability of streaming applications.
Industry Standards and Expectations
According to standards and best practices for streaming protocols like SRT, when a connection is terminated, the server should promptly notify clients of the EOF condition. This allows clients to handle the termination gracefully, preventing resource leaks and ensuring a smooth user experience. As highlighted by Haivision, a leading provider of SRT solutions, proper error handling and connection management are crucial for robust SRT implementations. The expectation is that an SRT server should actively monitor connections for timeouts and enforce these timeouts to prevent idle connections from lingering indefinitely. This ensures efficient resource utilization and overall system stability.
Implications and Impact
This bug can have several significant implications for applications using the SRS SRT server:
- Resource Leaks: Clients that fail to recognize connection terminations can continue to hold resources, such as sockets and memory, leading to resource exhaustion over time.
- Stability Issues: Unclosed connections can contribute to instability, especially in high-volume streaming environments where numerous connections are established and terminated frequently.
- Unexpected Behavior: The lack of EOF notification can cause unexpected behavior in client applications, potentially leading to errors and a degraded user experience.
- Difficulty in Diagnosing Issues: The misleading log messages can make it challenging to diagnose the root cause of connection-related problems, prolonging troubleshooting efforts.
These implications underscore the importance of addressing this bug promptly to ensure the reliability and stability of SRS-based SRT streaming applications. Developers and users need to be aware of these potential issues and implement appropriate workarounds or mitigation strategies until a fix is available.
Potential Workarounds
While a permanent fix for this bug requires modifications to the SRS server codebase, there are several potential workarounds that can be employed in the interim:
-
Application-Level Timeout Handling: Implement timeout mechanisms within the client application itself. This involves setting timers that monitor the connection and trigger a closure if no data is received within a specified period. This workaround provides a safety net in case the server-side timeouts are not functioning correctly.
-
Periodic Health Checks: Implement periodic health checks from the client to the server. These checks can involve sending lightweight requests to verify the connection's status. If a health check fails, the client can proactively close the connection.
-
Connection Pooling with Timeouts: Use connection pooling techniques with aggressive timeout settings. This involves reusing existing connections whenever possible but ensuring that connections are closed if they remain idle for a certain duration.
-
Monitoring and Alerting: Implement monitoring and alerting systems that track the number of active SRT connections. If the number of connections exceeds a certain threshold or if connections remain open for an unexpectedly long time, alerts can be triggered to investigate the issue.
These workarounds can help mitigate the impact of the bug and improve the stability of SRT streaming applications. However, they should be considered temporary measures until a proper fix is implemented in the SRS server.
Conclusion
The bug in SRS version 7.0.128 concerning SRT EOF notification and timeout handling presents a significant challenge for developers and users relying on this server for streaming applications. The failure to properly notify clients of connection terminations and the inability to enforce timeout settings can lead to resource leaks, stability issues, and unexpected behavior. Understanding the root cause, implications, and potential workarounds for this bug is crucial for mitigating its impact and ensuring the reliability of SRT streaming deployments.
It is essential for the SRS development team to address this issue promptly to restore the expected behavior of the SRT server. In the meantime, the workarounds discussed in this article can provide temporary solutions to minimize the impact of the bug. By staying informed and implementing appropriate mitigation strategies, developers and users can continue to leverage the benefits of SRT for robust and reliable media streaming.
For further information on SRT and best practices, you can visit the SRT Alliance website.