Adaptive Backoff For Git Polling: Implementation Guide
The WorktreeMonitor within Canopy’s UI acts as the system's heartbeat, diligently checking Git status. Currently, it employs a fixed polling interval, which defaults to 2 seconds. While this approach works adequately for smaller repositories, it encounters significant challenges when dealing with large repositories, such as the Linux kernel or massive monorepos. In these scenarios, the git diff --numstat command can take longer to execute than the polling interval, leading to a buildup of processes in the queue, spikes in CPU usage, and frustrating UI freezes. This article delves into the intricacies of implementing adaptive backoff and congestion control to optimize Git polling, ensuring a smoother and more efficient user experience.
Understanding the Current State of Git Polling
Currently, the polling mechanism within electron/services/WorktreeMonitor.ts operates on a fixed timer. This means that regardless of the repository size or the time required for Git operations, the system initiates a new poll at a predetermined interval. This can lead to inefficiencies and performance issues, especially in larger repositories. To fully grasp the scope of the problem and the proposed solutions, let's delve into the current state, the associated challenges, and the underlying motivations for change.
Current Implementation Details
In the existing implementation, the polling process is governed by a fixed timer. Specifically, the code snippet below from electron/services/WorktreeMonitor.ts illustrates how the polling interval is set using setInterval():
// Line 931-934
this.pollingTimer = setInterval(() => {
void this.updateGitStatus();
}, this.pollingInterval);
This setup means that the updateGitStatus() function is called repeatedly at the interval specified by this.pollingInterval. While straightforward, this approach doesn't account for the variability in Git operation times, especially in large repositories. The implications of this fixed interval are far-reaching, affecting system performance, battery life, and overall user experience.
Affected Files
To understand the full scope of the current polling mechanism, it’s essential to identify the key files involved:
electron/services/WorktreeMonitor.ts: This file contains the main polling logic, dictating how and when Git status updates are performed.electron/utils/git.ts: This utility module houses the Git operations that the monitor calls, such as fetching diff stats.electron/services/WorktreeService.ts: This service manages multiple monitors, making it a crucial component in understanding the system's overall polling behavior.
Key Challenges with the Current Approach
- Process Overlap: When the time required for
git diff --numstat HEADexceeds the polling interval (for instance, 3 seconds for the Git operation versus a 2-second polling interval), multiple processes can run concurrently. This overlap can lead to resource contention and performance degradation. - CPU Spikes: Repeated calls to
git status, especially in repositories with 10,000+ files, can saturate the CPU. This is because eachgit statuscall involves traversing the file system and computing the necessary status information, which can be computationally intensive. - Battery Drain: Continuous polling, even in idle worktrees, drains battery power on laptops. This is a significant concern for users who rely on their laptops for extended periods without access to a power source.
- Lag: The
isUpdatinglock, designed to prevent overlaps, can inadvertently cause skipped polls and delayed updates. While it prevents crashes, it doesn't address the root cause of the problem, which is polling faster than Git can respond.
Illustrative Code Snippet
The following code snippet from WorktreeMonitor.ts exemplifies the locking mechanism in place to prevent overlapping updates:
private async updateGitStatus(forceRefresh: boolean = false): Promise<void> {
// Prevent overlapping updates
if (!this.isRunning || this.isUpdating) {
return; // Silently drops the poll if previous one is still running
}
this.isUpdating = true;
This lock ensures that only one update operation runs at a time, preventing race conditions and crashes. However, it also means that if a Git operation takes longer than the polling interval, subsequent polls will be dropped, leading to missed updates and a delayed view of the repository's status.
The Motivation Behind Adaptive Backoff
The primary motivation for implementing adaptive backoff and congestion control lies in preventing the application from exhibiting undesirable behaviors such as freezing on large repositories, wasting CPU and battery resources, and lagging behind on actual changes due to skipped polls. These issues can significantly degrade the user experience, making the application feel sluggish and unresponsive.
Specific Scenarios and Their Requirements
- Small Repositories (< 100 files): In small repositories, the existing 2-second interval works reasonably well. The goal is to maintain this responsiveness without introducing unnecessary overhead.
- Large Repositories (> 5,000 files): For larger repositories, Git operations can take anywhere from 3 to 10 seconds or even longer. In such cases, an adaptive backoff mechanism is crucial to prevent process overlap and CPU saturation.
- Idle Worktrees: When worktrees are idle, i.e., there are no changes for an extended period (e.g., 5+ minutes), there's no need to poll as frequently. Reducing the polling interval to, say, 30 seconds can significantly conserve resources.
Expected Improvements
By implementing adaptive backoff and congestion control, Canopy aims to achieve the following improvements:
- Enhanced Responsiveness: The application should remain responsive even when working with large repositories.
- Reduced Resource Consumption: CPU and battery usage should be minimized, especially in idle worktrees.
- Timely Updates: The UI should reflect the actual status of the repository without significant delays.
Context: WorktreeMonitor and Its Role
The WorktreeMonitor serves as the cornerstone of Canopy's UI, playing a pivotal role in maintaining an up-to-date view of the repository's state. It achieves this by:
- Polling
git statusviagetWorktreeChangesWithStats()inelectron/utils/git.ts. - Running
git diff --numstat HEADto calculate insertion/deletion statistics. - Triggering AI summary generation on state changes.
- Emitting IPC events to update the UI, specifically WorktreeCard components.
Performance Bottlenecks in git.ts
While efforts have been made to optimize Git operations, performance bottlenecks persist, particularly in the following areas:
getWorktreeChangesWithStatscurrently limits numstat to 100 files (line 153), which helps mitigate the impact of large repositories but doesn't eliminate the problem entirely.- Untracked files are batched with a concurrency limit of 10 (line 326), which improves efficiency but doesn't address the issue of slow Git operations.
- Crucially, there's no backoff mechanism in place when Git operations themselves are slow. This lack of backoff is a significant contributor to the performance issues experienced with large repositories.
Deliverables: The Path Forward
The implementation of adaptive backoff and congestion control requires a multifaceted approach, encompassing code changes, testing, and documentation. The deliverables outlined below provide a roadmap for achieving these goals.
Code Changes: Modifying the Polling Logic
The primary focus of the code changes will be on replacing the fixed interval polling mechanism with an adaptive scheduler. This involves modifying the WorktreeMonitor.ts file to incorporate the new logic.
Files to Modify
electron/services/WorktreeMonitor.ts: This file is the central hub for the changes, as it contains the existing polling logic that needs to be replaced.electron/utils/git.ts: This file will be modified to optionally add operation duration tracking, providing valuable data for the adaptive backoff mechanism.
Proposed Architecture
The proposed architecture involves replacing the fixed setInterval() with a self-scheduling pattern. This pattern allows the system to dynamically adjust the polling interval based on the performance of the Git operations.
The following code snippet illustrates the proposed self-scheduling pattern:
// WorktreeMonitor.ts
private pollingInterval: number = 2000; // Base interval
private maxPollingInterval: number = 30000; // Max quiet interval
private failureCount: number = 0;
private lastOperationDuration: number = 0;
private scheduleNextPoll(): void {
if (!this.isRunning || !this.pollingEnabled) return;
// Calculate next interval based on last operation duration
const buffer = Math.max(1.5, this.lastOperationDuration / 1000); // 1.5x last duration
const nextInterval = Math.min(
Math.max(this.pollingInterval, buffer * 1000),
this.maxPollingInterval
);
this.pollingTimer = setTimeout(() => {
void this.poll();
}, nextInterval);
}
private async poll(): Promise<void> {
const startTime = Date.now();
try {
await this.updateGitStatus();
this.failureCount = 0; // Reset on success
this.lastOperationDuration = Date.now() - startTime;
} catch (error) {
this.failureCount++;
// Circuit breaker: After 3 failures, switch to error mode
if (this.failureCount >= 3) {
this.state.mood = "error";
this.emitUpdate();
this.stopPolling(); // Stop until manual refresh
return;
}
} finally {
this.scheduleNextPoll(); // Schedule next poll AFTER current one completes
}
}
Adaptive Interval Logic
The adaptive interval logic operates on the following principles:
- Cooldown Mode: The next poll waits for at least 1.5 times the
lastOperationDuration. This ensures that the system doesn't overload itself with frequent polls when Git operations are taking longer. - Quiet Mode (optional): If no changes are detected for an extended period (e.g., 5+ minutes), the interval is increased to 30 seconds. This conserves resources in idle worktrees.
- Circuit Breaker: After 3 consecutive failures, the system sets the mood to "error" and pauses polling. This prevents the system from continuously attempting to poll when there are persistent issues.
Configuration Options
To provide flexibility and customization, the following configuration options will be added to electron/types/config.ts:
monitor?: {
pollIntervalActive?: number; // Base interval (default 2000ms)
pollIntervalBackground?: number; // Unused in adaptive mode
pollIntervalMax?: number; // Max quiet interval (default 30000ms)
adaptiveBackoff?: boolean; // Enable adaptive backoff (default true)
}
Tests: Ensuring Robustness and Reliability
Testing is a critical component of the implementation process. A comprehensive suite of tests will be developed to ensure the robustness and reliability of the adaptive backoff mechanism.
Types of Tests
- Unit Tests: Verify that
scheduleNextPoll()calculates the correct intervals based on different scenarios. - Integration Tests: Simulate slow Git operations and verify that the backoff mechanism functions as expected.
- Circuit Breaker Tests: Ensure that 3 failures trigger the error state and that the system responds appropriately.
- Performance Tests: Verify that there are no overlapping Git processes and that the system performs efficiently under various loads.
Documentation: Guiding Users and Developers
Clear and comprehensive documentation is essential for guiding users and developers on how to use and maintain the adaptive backoff mechanism.
Documentation Updates
- Update
CLAUDE.mdto explain the adaptive polling behavior. - Add JSDoc comments to the code to provide detailed explanations of the self-scheduling pattern and other key components.
- Document the configuration options in
docs/spec.md, providing guidance on how to customize the polling behavior.
Technical Specifications: A Deep Dive
To fully understand the implications of implementing adaptive backoff and congestion control, it’s crucial to delve into the technical specifications. This section provides a detailed look at the footprint, performance considerations, and backward compatibility aspects of the proposed changes.
Footprint: Identifying the Core Modules
The implementation of adaptive backoff primarily affects the following modules:
- Core Modules: The WorktreeMonitor polling logic is the heart of the changes, as it dictates how and when Git status updates are performed.
- Git Utilities: Optional duration tracking in
getWorktreeChangesWithStatsprovides valuable data for the adaptive backoff mechanism. - Config System: The new
monitor.pollIntervalMaxoption allows users to customize the maximum polling interval in quiet mode.
Performance Considerations: Balancing Responsiveness and Efficiency
Adaptive backoff is designed to strike a balance between responsiveness and efficiency. The performance considerations vary depending on the scenario:
- Best Case (Small Repo): In small repositories, there should be no noticeable change in behavior. The system will continue to poll every 2 seconds, maintaining responsiveness.
- Large Repo: In large repositories, the system will automatically adapt to 5-10 second intervals based on Git operation duration. This prevents process overlap and CPU saturation.
- Idle Repo: In idle repositories, the interval can be reduced to 30 seconds after 5 minutes of inactivity, conserving resources.
- Failed Repo: If Git operations fail repeatedly, the system will stop polling after 3 failures to prevent CPU waste.
Backward Compatibility: Ensuring a Smooth Transition
Backward compatibility is a key concern. The implementation is designed to ensure a smooth transition for existing users:
- The default behavior remains 2-second polling. Adaptive backoff is opt-in via configuration.
- The existing
setPollingInterval()API still works for manual override, providing flexibility for advanced users.
Dependencies: Identifying Blocking and Informational Factors
To ensure a smooth implementation process, it’s essential to identify any dependencies that might impact the work.
Blocking Dependencies
- Currently, there are no blocking dependencies. The implementation can proceed independently without waiting for other tasks to complete.
Informational Dependencies
- Issue #92 (Reduce Technical Debt) is an informational dependency. It addresses a scalability risk identified in code review and provides valuable context for the adaptive backoff implementation.
Tasks: A Detailed Action Plan
To ensure a structured and efficient implementation process, the following tasks have been identified:
- [ ] Modify
WorktreeMonitor.ts:926-944to replacesetInterval()with self-scheduling. - [ ] Add
lastOperationDurationtracking in thepoll()method. - [ ] Implement adaptive interval calculation in
scheduleNextPoll(). - [ ] Add circuit breaker logic (3 failures → error state).
- [ ] (Optional) Add "quiet mode" that increases the interval after 5 minutes of inactivity.
- [ ] Add configuration options to
electron/types/config.ts. - [ ] Write unit tests for interval calculation logic.
- [ ] Test on a large repository (e.g., clone the Linux kernel worktree).
- [ ] Update documentation to explain the adaptive behavior.
Acceptance Criteria: Defining Success
To ensure that the implementation meets the desired goals, the following acceptance criteria have been defined:
- [ ] Git operations never overlap (no concurrent
git diffprocesses). - [ ] The polling interval adapts to operation duration (if Git takes 5s, wait 7.5s before the next poll).
- [ ] The circuit breaker triggers after 3 consecutive failures.
- [ ] WorktreeMonitor sets the mood to "error" when polling stops.
- [ ] Manual refresh (Cmd+R) resets the circuit breaker and resumes polling.
- [ ] No performance regression on small repositories (< 100 files).
Edge Cases & Risks: Anticipating Challenges
As with any complex implementation, it’s essential to anticipate potential edge cases and risks. This section outlines the identified risks and edge cases, along with mitigation strategies.
Risks
- Delayed UI Updates: If Git is slow, UI updates may lag by 10-20 seconds.
- Mitigation: Keep the base interval at 2s and only apply backoff when operations actually slow down.
- Circuit Breaker Too Aggressive: 3 failures might be too low for flaky networks.
- Mitigation: Make the failure threshold configurable or use exponential backoff instead.
Edge Cases
- Transient Git Lock: If an
index.lockcollision happens, don't increment the failure count. - Worktree Deleted: The circuit breaker should distinguish between "slow" and "gone" worktrees.
- Manual Refresh: A user pressing "Refresh" should reset the failure count and backoff.
Alternatives Considered: Evaluating Other Options
Before settling on the adaptive backoff approach, several alternatives were considered. This section outlines these alternatives and the rationale for choosing adaptive backoff.
- Fixed Long Interval (10s): This approach is simple but sacrifices responsiveness on small repositories.
- File Watcher + On-Demand Polling: This eliminates polling but requires chokidar integration, adding complexity.
- Process Queue with Concurrency Limit: This doesn't solve the root cause of slow Git operations.
Conclusion: A Smarter Approach to Git Polling
Implementing adaptive backoff and congestion control for Git polling represents a significant step forward in optimizing Canopy’s UI. By dynamically adjusting the polling interval based on Git operation performance, Canopy can avoid the pitfalls of fixed-interval polling, such as process overlap, CPU spikes, and battery drain. This smarter approach ensures a more responsive and efficient user experience, particularly when working with large repositories. This proactive solution not only addresses current performance bottlenecks but also lays the groundwork for future scalability and efficiency improvements.
For more in-depth information on Git performance and optimization, consider exploring resources like the Pro Git book, a comprehensive guide to Git that covers various aspects of Git performance tuning.