Adaptive Backoff For Git Polling: Implementation Guide

by Alex Johnson 55 views

The WorktreeMonitor within Canopy’s UI acts as the system's heartbeat, diligently checking Git status. Currently, it employs a fixed polling interval, which defaults to 2 seconds. While this approach works adequately for smaller repositories, it encounters significant challenges when dealing with large repositories, such as the Linux kernel or massive monorepos. In these scenarios, the git diff --numstat command can take longer to execute than the polling interval, leading to a buildup of processes in the queue, spikes in CPU usage, and frustrating UI freezes. This article delves into the intricacies of implementing adaptive backoff and congestion control to optimize Git polling, ensuring a smoother and more efficient user experience.

Understanding the Current State of Git Polling

Currently, the polling mechanism within electron/services/WorktreeMonitor.ts operates on a fixed timer. This means that regardless of the repository size or the time required for Git operations, the system initiates a new poll at a predetermined interval. This can lead to inefficiencies and performance issues, especially in larger repositories. To fully grasp the scope of the problem and the proposed solutions, let's delve into the current state, the associated challenges, and the underlying motivations for change.

Current Implementation Details

In the existing implementation, the polling process is governed by a fixed timer. Specifically, the code snippet below from electron/services/WorktreeMonitor.ts illustrates how the polling interval is set using setInterval():

// Line 931-934
this.pollingTimer = setInterval(() => {
  void this.updateGitStatus();
}, this.pollingInterval);

This setup means that the updateGitStatus() function is called repeatedly at the interval specified by this.pollingInterval. While straightforward, this approach doesn't account for the variability in Git operation times, especially in large repositories. The implications of this fixed interval are far-reaching, affecting system performance, battery life, and overall user experience.

Affected Files

To understand the full scope of the current polling mechanism, it’s essential to identify the key files involved:

Key Challenges with the Current Approach

  1. Process Overlap: When the time required for git diff --numstat HEAD exceeds the polling interval (for instance, 3 seconds for the Git operation versus a 2-second polling interval), multiple processes can run concurrently. This overlap can lead to resource contention and performance degradation.
  2. CPU Spikes: Repeated calls to git status, especially in repositories with 10,000+ files, can saturate the CPU. This is because each git status call involves traversing the file system and computing the necessary status information, which can be computationally intensive.
  3. Battery Drain: Continuous polling, even in idle worktrees, drains battery power on laptops. This is a significant concern for users who rely on their laptops for extended periods without access to a power source.
  4. Lag: The isUpdating lock, designed to prevent overlaps, can inadvertently cause skipped polls and delayed updates. While it prevents crashes, it doesn't address the root cause of the problem, which is polling faster than Git can respond.

Illustrative Code Snippet

The following code snippet from WorktreeMonitor.ts exemplifies the locking mechanism in place to prevent overlapping updates:

private async updateGitStatus(forceRefresh: boolean = false): Promise<void> {
  // Prevent overlapping updates
  if (!this.isRunning || this.isUpdating) {
    return; // Silently drops the poll if previous one is still running
  }
  this.isUpdating = true;

This lock ensures that only one update operation runs at a time, preventing race conditions and crashes. However, it also means that if a Git operation takes longer than the polling interval, subsequent polls will be dropped, leading to missed updates and a delayed view of the repository's status.

The Motivation Behind Adaptive Backoff

The primary motivation for implementing adaptive backoff and congestion control lies in preventing the application from exhibiting undesirable behaviors such as freezing on large repositories, wasting CPU and battery resources, and lagging behind on actual changes due to skipped polls. These issues can significantly degrade the user experience, making the application feel sluggish and unresponsive.

Specific Scenarios and Their Requirements

  • Small Repositories (< 100 files): In small repositories, the existing 2-second interval works reasonably well. The goal is to maintain this responsiveness without introducing unnecessary overhead.
  • Large Repositories (> 5,000 files): For larger repositories, Git operations can take anywhere from 3 to 10 seconds or even longer. In such cases, an adaptive backoff mechanism is crucial to prevent process overlap and CPU saturation.
  • Idle Worktrees: When worktrees are idle, i.e., there are no changes for an extended period (e.g., 5+ minutes), there's no need to poll as frequently. Reducing the polling interval to, say, 30 seconds can significantly conserve resources.

Expected Improvements

By implementing adaptive backoff and congestion control, Canopy aims to achieve the following improvements:

  • Enhanced Responsiveness: The application should remain responsive even when working with large repositories.
  • Reduced Resource Consumption: CPU and battery usage should be minimized, especially in idle worktrees.
  • Timely Updates: The UI should reflect the actual status of the repository without significant delays.

Context: WorktreeMonitor and Its Role

The WorktreeMonitor serves as the cornerstone of Canopy's UI, playing a pivotal role in maintaining an up-to-date view of the repository's state. It achieves this by:

  • Polling git status via getWorktreeChangesWithStats() in electron/utils/git.ts.
  • Running git diff --numstat HEAD to calculate insertion/deletion statistics.
  • Triggering AI summary generation on state changes.
  • Emitting IPC events to update the UI, specifically WorktreeCard components.

Performance Bottlenecks in git.ts

While efforts have been made to optimize Git operations, performance bottlenecks persist, particularly in the following areas:

  • getWorktreeChangesWithStats currently limits numstat to 100 files (line 153), which helps mitigate the impact of large repositories but doesn't eliminate the problem entirely.
  • Untracked files are batched with a concurrency limit of 10 (line 326), which improves efficiency but doesn't address the issue of slow Git operations.
  • Crucially, there's no backoff mechanism in place when Git operations themselves are slow. This lack of backoff is a significant contributor to the performance issues experienced with large repositories.

Deliverables: The Path Forward

The implementation of adaptive backoff and congestion control requires a multifaceted approach, encompassing code changes, testing, and documentation. The deliverables outlined below provide a roadmap for achieving these goals.

Code Changes: Modifying the Polling Logic

The primary focus of the code changes will be on replacing the fixed interval polling mechanism with an adaptive scheduler. This involves modifying the WorktreeMonitor.ts file to incorporate the new logic.

Files to Modify

  • electron/services/WorktreeMonitor.ts: This file is the central hub for the changes, as it contains the existing polling logic that needs to be replaced.
  • electron/utils/git.ts: This file will be modified to optionally add operation duration tracking, providing valuable data for the adaptive backoff mechanism.

Proposed Architecture

The proposed architecture involves replacing the fixed setInterval() with a self-scheduling pattern. This pattern allows the system to dynamically adjust the polling interval based on the performance of the Git operations.

The following code snippet illustrates the proposed self-scheduling pattern:

// WorktreeMonitor.ts
private pollingInterval: number = 2000; // Base interval
private maxPollingInterval: number = 30000; // Max quiet interval
private failureCount: number = 0;
private lastOperationDuration: number = 0;

private scheduleNextPoll(): void {
  if (!this.isRunning || !this.pollingEnabled) return;

  // Calculate next interval based on last operation duration
  const buffer = Math.max(1.5, this.lastOperationDuration / 1000); // 1.5x last duration
  const nextInterval = Math.min(
    Math.max(this.pollingInterval, buffer * 1000),
    this.maxPollingInterval
  );

  this.pollingTimer = setTimeout(() => {
    void this.poll();
  }, nextInterval);
}

private async poll(): Promise<void> {
  const startTime = Date.now();
  
  try {
    await this.updateGitStatus();
    this.failureCount = 0; // Reset on success
    this.lastOperationDuration = Date.now() - startTime;
  } catch (error) {
    this.failureCount++;
    
    // Circuit breaker: After 3 failures, switch to error mode
    if (this.failureCount >= 3) {
      this.state.mood = "error";
      this.emitUpdate();
      this.stopPolling(); // Stop until manual refresh
      return;
    }
  } finally {
    this.scheduleNextPoll(); // Schedule next poll AFTER current one completes
  }
}

Adaptive Interval Logic

The adaptive interval logic operates on the following principles:

  1. Cooldown Mode: The next poll waits for at least 1.5 times the lastOperationDuration. This ensures that the system doesn't overload itself with frequent polls when Git operations are taking longer.
  2. Quiet Mode (optional): If no changes are detected for an extended period (e.g., 5+ minutes), the interval is increased to 30 seconds. This conserves resources in idle worktrees.
  3. Circuit Breaker: After 3 consecutive failures, the system sets the mood to "error" and pauses polling. This prevents the system from continuously attempting to poll when there are persistent issues.

Configuration Options

To provide flexibility and customization, the following configuration options will be added to electron/types/config.ts:

monitor?: {
  pollIntervalActive?: number; // Base interval (default 2000ms)
  pollIntervalBackground?: number; // Unused in adaptive mode
  pollIntervalMax?: number; // Max quiet interval (default 30000ms)
  adaptiveBackoff?: boolean; // Enable adaptive backoff (default true)
}

Tests: Ensuring Robustness and Reliability

Testing is a critical component of the implementation process. A comprehensive suite of tests will be developed to ensure the robustness and reliability of the adaptive backoff mechanism.

Types of Tests

  • Unit Tests: Verify that scheduleNextPoll() calculates the correct intervals based on different scenarios.
  • Integration Tests: Simulate slow Git operations and verify that the backoff mechanism functions as expected.
  • Circuit Breaker Tests: Ensure that 3 failures trigger the error state and that the system responds appropriately.
  • Performance Tests: Verify that there are no overlapping Git processes and that the system performs efficiently under various loads.

Documentation: Guiding Users and Developers

Clear and comprehensive documentation is essential for guiding users and developers on how to use and maintain the adaptive backoff mechanism.

Documentation Updates

  • Update CLAUDE.md to explain the adaptive polling behavior.
  • Add JSDoc comments to the code to provide detailed explanations of the self-scheduling pattern and other key components.
  • Document the configuration options in docs/spec.md, providing guidance on how to customize the polling behavior.

Technical Specifications: A Deep Dive

To fully understand the implications of implementing adaptive backoff and congestion control, it’s crucial to delve into the technical specifications. This section provides a detailed look at the footprint, performance considerations, and backward compatibility aspects of the proposed changes.

Footprint: Identifying the Core Modules

The implementation of adaptive backoff primarily affects the following modules:

  • Core Modules: The WorktreeMonitor polling logic is the heart of the changes, as it dictates how and when Git status updates are performed.
  • Git Utilities: Optional duration tracking in getWorktreeChangesWithStats provides valuable data for the adaptive backoff mechanism.
  • Config System: The new monitor.pollIntervalMax option allows users to customize the maximum polling interval in quiet mode.

Performance Considerations: Balancing Responsiveness and Efficiency

Adaptive backoff is designed to strike a balance between responsiveness and efficiency. The performance considerations vary depending on the scenario:

  • Best Case (Small Repo): In small repositories, there should be no noticeable change in behavior. The system will continue to poll every 2 seconds, maintaining responsiveness.
  • Large Repo: In large repositories, the system will automatically adapt to 5-10 second intervals based on Git operation duration. This prevents process overlap and CPU saturation.
  • Idle Repo: In idle repositories, the interval can be reduced to 30 seconds after 5 minutes of inactivity, conserving resources.
  • Failed Repo: If Git operations fail repeatedly, the system will stop polling after 3 failures to prevent CPU waste.

Backward Compatibility: Ensuring a Smooth Transition

Backward compatibility is a key concern. The implementation is designed to ensure a smooth transition for existing users:

  • The default behavior remains 2-second polling. Adaptive backoff is opt-in via configuration.
  • The existing setPollingInterval() API still works for manual override, providing flexibility for advanced users.

Dependencies: Identifying Blocking and Informational Factors

To ensure a smooth implementation process, it’s essential to identify any dependencies that might impact the work.

Blocking Dependencies

  • Currently, there are no blocking dependencies. The implementation can proceed independently without waiting for other tasks to complete.

Informational Dependencies

  • Issue #92 (Reduce Technical Debt) is an informational dependency. It addresses a scalability risk identified in code review and provides valuable context for the adaptive backoff implementation.

Tasks: A Detailed Action Plan

To ensure a structured and efficient implementation process, the following tasks have been identified:

  • [ ] Modify WorktreeMonitor.ts:926-944 to replace setInterval() with self-scheduling.
  • [ ] Add lastOperationDuration tracking in the poll() method.
  • [ ] Implement adaptive interval calculation in scheduleNextPoll().
  • [ ] Add circuit breaker logic (3 failures → error state).
  • [ ] (Optional) Add "quiet mode" that increases the interval after 5 minutes of inactivity.
  • [ ] Add configuration options to electron/types/config.ts.
  • [ ] Write unit tests for interval calculation logic.
  • [ ] Test on a large repository (e.g., clone the Linux kernel worktree).
  • [ ] Update documentation to explain the adaptive behavior.

Acceptance Criteria: Defining Success

To ensure that the implementation meets the desired goals, the following acceptance criteria have been defined:

  • [ ] Git operations never overlap (no concurrent git diff processes).
  • [ ] The polling interval adapts to operation duration (if Git takes 5s, wait 7.5s before the next poll).
  • [ ] The circuit breaker triggers after 3 consecutive failures.
  • [ ] WorktreeMonitor sets the mood to "error" when polling stops.
  • [ ] Manual refresh (Cmd+R) resets the circuit breaker and resumes polling.
  • [ ] No performance regression on small repositories (< 100 files).

Edge Cases & Risks: Anticipating Challenges

As with any complex implementation, it’s essential to anticipate potential edge cases and risks. This section outlines the identified risks and edge cases, along with mitigation strategies.

Risks

  1. Delayed UI Updates: If Git is slow, UI updates may lag by 10-20 seconds.
    • Mitigation: Keep the base interval at 2s and only apply backoff when operations actually slow down.
  2. Circuit Breaker Too Aggressive: 3 failures might be too low for flaky networks.
    • Mitigation: Make the failure threshold configurable or use exponential backoff instead.

Edge Cases

  • Transient Git Lock: If an index.lock collision happens, don't increment the failure count.
  • Worktree Deleted: The circuit breaker should distinguish between "slow" and "gone" worktrees.
  • Manual Refresh: A user pressing "Refresh" should reset the failure count and backoff.

Alternatives Considered: Evaluating Other Options

Before settling on the adaptive backoff approach, several alternatives were considered. This section outlines these alternatives and the rationale for choosing adaptive backoff.

  1. Fixed Long Interval (10s): This approach is simple but sacrifices responsiveness on small repositories.
  2. File Watcher + On-Demand Polling: This eliminates polling but requires chokidar integration, adding complexity.
  3. Process Queue with Concurrency Limit: This doesn't solve the root cause of slow Git operations.

Conclusion: A Smarter Approach to Git Polling

Implementing adaptive backoff and congestion control for Git polling represents a significant step forward in optimizing Canopy’s UI. By dynamically adjusting the polling interval based on Git operation performance, Canopy can avoid the pitfalls of fixed-interval polling, such as process overlap, CPU spikes, and battery drain. This smarter approach ensures a more responsive and efficient user experience, particularly when working with large repositories. This proactive solution not only addresses current performance bottlenecks but also lays the groundwork for future scalability and efficiency improvements.

For more in-depth information on Git performance and optimization, consider exploring resources like the Pro Git book, a comprehensive guide to Git that covers various aspects of Git performance tuning.