Playwright Bug: Wait_for_selector On Detached Frame

by Alex Johnson 52 views

Introduction

This article delves into a specific bug encountered in Playwright's wait_for_selector method when dealing with detached frames. Playwright, a powerful automation library, is widely used for end-to-end testing and web scraping. Understanding and addressing such bugs is crucial for maintaining the reliability of automation scripts. This issue was observed in Playwright version 1.56, across both .NET and Python implementations. Specifically, the bug surfaces when the wait_for_selector method is called on a frame that has been detached from the DOM. The root cause lies in how Playwright's internal mechanisms handle the context of the frame after it has been detached. Without proper handling, the code attempts to access properties of an undefined context, leading to an error. This article will explore the bug in detail, examining the source code, steps to reproduce it, and the proposed solutions. We'll also discuss the implications of this bug for Playwright users and the importance of addressing such issues in automation libraries. By understanding the intricacies of this bug, developers can better safeguard their automation scripts and contribute to the overall stability of Playwright. Identifying the root cause is essential for a robust fix. We will trace the execution flow to pinpoint where the undefined context is being accessed, providing insights into the internal workings of Playwright's frame handling. This bug report not only highlights a specific issue but also serves as a valuable learning opportunity for understanding the complexities of browser automation and the importance of careful error handling in automation libraries. Addressing this bug enhances the reliability of Playwright and provides a better experience for its users, ensuring that the wait_for_selector method functions correctly even when dealing with detached frames.

System Information

  • Playwright Version: v1.56
  • Operating System: Windows 11
  • Browser: Chromium/Chrome
  • Programming Languages: .NET and Python
  • Issue Origin: Test case from page-wait-for-selector-1.spec.ts (expected to throw an error when a frame is detached)

Source Code

The following Python code demonstrates the bug:

import asyncio
from patchright.async_api import async_playwright, Error as PlaywrightError
#from playwright.async_api import async_playwright, Error as PlaywrightError

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            channel="chrome",
            headless=False
        )
        context = await browser.new_context(viewport=None)
        page = await context.new_page()
        
        url = "https://www.google.com/blank.html"
        frame_id = "frame1"
        
        handle = await page.evaluate_handle("""async ({frameId, url}) => {
          const frame = document.createElement('iframe');
          frame.src = url;
          frame.id = frameId;
          document.body.appendChild(frame);
          await new Promise(x => frame.onload = x);
          return frame
        }""", {"frameId": frame_id, "url": url})
        
        content = await handle.content_frame()
        
        frame = next((f for f in page.frames if f.parent_frame == page.main_frame), None)
        
        wait_task = asyncio.create_task(frame.wait_for_selector(".box"))
        
        await page.evaluate("""function detachFrame(frameId) {
          const frame = document.getElementById(frameId);
          frame.remove();
        }""", frame_id)
        
        wait_exception = None
        try:
            await wait_task
        except Exception as ex:
            wait_exception = ex
        
        print(f"error message = {wait_exception}")
        
        await browser.close()

asyncio.run(main())

Code Explanation

The provided Python code uses Playwright to automate browser interactions. Let's break down the code section by section to understand its functionality and how it triggers the bug.

  1. Importing Libraries: The code begins by importing necessary libraries. asyncio is used for asynchronous programming, which is essential for Playwright's non-blocking operations. async_playwright and Error are imported from the patchright.async_api module (or playwright.async_api if the patch is not applied). This sets the stage for using Playwright's asynchronous API.

  2. Defining the Main Function: The core logic resides within the main function, which is defined as an asynchronous function using async def main():. This is crucial for utilizing Playwright's asynchronous methods. The async with async_playwright() as p: statement initializes Playwright. The async with construct ensures that Playwright resources are properly managed, especially the browser processes, and automatically cleaned up when the block is exited.

  3. Launching the Browser: Inside the Playwright context, the code launches a Chromium browser instance using browser = await p.chromium.launch(channel="chrome", headless=False). The channel="chrome" argument specifies the use of the Chrome browser channel (if available), and headless=False indicates that the browser should be launched in a visible, non-headless mode. This is beneficial for debugging and observing the browser's behavior during script execution.

  4. Creating Context and Page: A new browser context is created using context = await browser.new_context(viewport=None). A browser context is an isolated browsing session, which is useful for managing different user profiles or avoiding shared state between tests. The viewport=None argument ensures that the page viewport is not explicitly set, allowing the browser to use its default viewport size. A new page is then created within this context using page = await context.new_page(). The page object represents a single tab or window in the browser and provides methods for interacting with web content.

  5. Loading Content via Iframe: The code proceeds to load content into an iframe. It first defines a URL (url = "https://www.google.com/blank.html") and an iframe ID (frame_id = "frame1"). Then, it uses page.evaluate_handle to execute JavaScript code within the page's context. This JavaScript code creates an iframe element, sets its src attribute to the specified URL, appends it to the document body, and waits for the iframe to load. The await new Promise(x => frame.onload = x); line ensures that the script waits for the iframe's onload event before proceeding, guaranteeing that the iframe content is fully loaded. The page.evaluate_handle method returns a handle to the created iframe element, which can be used for further interaction.

  6. Accessing the Iframe's Content Frame: The content frame of the iframe is accessed using content = await handle.content_frame(). This method retrieves the Frame object associated with the iframe's content, allowing interaction with the content within the iframe. The line frame = next((f for f in page.frames if f.parent_frame == page.main_frame), None) retrieves the frame object from the page's frames. It filters the frames to find the one whose parent frame is the main frame, effectively isolating the iframe's frame. The next function with the None default ensures that if no matching frame is found, the frame variable will be None.

  7. Setting up the Wait Task: An asynchronous task is created to wait for a specific selector within the iframe using wait_task = asyncio.create_task(frame.wait_for_selector(".box")). The frame.wait_for_selector(".box") method instructs Playwright to wait until an element matching the CSS selector ".box" becomes visible within the iframe. This method returns a promise that resolves when the element is found or rejects if the timeout is reached or the frame is detached. The asyncio.create_task function wraps this promise into an asyncio task, allowing it to run concurrently with other operations.

  8. Detaching the Iframe: The core of the bug demonstration lies in detaching the iframe from the DOM. The code executes JavaScript within the page's context using await page.evaluate("""function detachFrame(frameId) { const frame = document.getElementById(frameId); frame.remove(); }""", frame_id). This JavaScript function retrieves the iframe element by its ID and then removes it from the DOM using frame.remove(). Detaching the iframe while the wait_for_selector task is pending triggers the bug.

  9. Handling the Exception: The code anticipates that the wait_for_selector task will throw an exception due to the detached frame. It uses a try...except block to catch this exception. The await wait_task line awaits the completion of the wait task. If the task completes successfully (which it should not in this case), the code would continue to the next line. However, since the iframe is detached, the wait_for_selector method will throw an exception. The except Exception as ex: block catches any exception that is thrown by the await wait_task line. The exception object is stored in the wait_exception variable for later inspection. The print(f"error message = {wait_exception}") line prints the error message from the caught exception. This allows the developer to see the specific error that was thrown, which is crucial for debugging and understanding the nature of the bug. The expected error message when the bug is triggered is related to accessing properties of an undefined context.

  10. Closing the Browser: Finally, the browser is closed using await browser.close(). This ensures that all browser processes are terminated, and resources are released.

Steps to Reproduce

  1. Run the provided Python script.

Expected Behavior

Without the patch, the expected output is:

error message = Frame.wait_for_selector: Frame was detached
Call log:
  - waiting for locator(".box") to be visible

Actual Behavior

After patching the driver, the output is:

error message = Frame.wait_for_selector: Cannot read properties of undefined (reading 'injectedScript')
Call log:
  - waiting for locator(".box") to be visible
  - waiting for locator(".box")

Root Cause Analysis

The core issue lies in the resolveInjectedForSelector function within lib/server/frameSelectors.js, specifically around line 163:

async resolveInjectedForSelector(selector, options, scope) {
  const resolved = await this.resolveFrameForSelector(selector, options, scope);
  if (!resolved)
    return;
  const context = await resolved.frame._context(options?.mainWorld ? "main" : resolved.info.world);
  const injected = await context.injectedScript();  // context is undefined here when frame detached
  return { injected, info: resolved.info, frame: resolved.frame, scope: resolved.scope };
}

The problem occurs because the context is undefined when the frame is detached. This leads to an attempt to read the injectedScript property of an undefined object, resulting in the error message: Cannot read properties of undefined (reading 'injectedScript').

Proposed Solution

The issue can be resolved by adding a check to ensure that the context is truthy before attempting to access its injectedScript property. There are two ways to address this:

  1. Conditional Check: Add an if condition to check if context is truthy and return an undefined injected variable if it is not.

    async resolveInjectedForSelector(selector, options, scope) {
      const resolved = await this.resolveFrameForSelector(selector, options, scope);
      if (!resolved)
        return;
      const context = await resolved.frame._context(options?.mainWorld ? "main" : resolved.info.world);
      if (!context)
        return;
      const injected = await context.injectedScript();
      return { injected, info: resolved.info, frame: resolved.frame, scope: resolved.scope };
    }
    
  2. Optional Chaining: Use optional chaining (?.) to safely access the injectedScript property. This approach also ensures that the correct error message is propagated to the caller.

    const injected = await context?.injectedScript();
    

Implementation with Optional Chaining

The recommended fix involves using optional chaining:

async resolveInjectedForSelector(selector, options, scope) {
  const resolved = await this.resolveFrameForSelector(selector, options, scope);
  if (!resolved)
    return;
  const context = await resolved.frame._context(options?.mainWorld ? "main" : resolved.info.world);
  const injected = await context?.injectedScript();
  return { injected, info: resolved.info, frame: resolved.frame, scope: resolved.scope };
}

This change ensures that if context is undefined, the injectedScript() method will not be called, and injected will be assigned undefined. This prevents the Cannot read properties of undefined error and allows the correct error message (Frame was detached) to be propagated to the caller.

Impact and Conclusion

This bug can lead to unexpected errors in Playwright scripts, particularly when dealing with dynamic content and frame manipulation. By applying the proposed solution, developers can ensure that their scripts handle detached frames gracefully and that the correct error messages are displayed. This improves the reliability and maintainability of Playwright-based automation. Addressing this issue is crucial for the stability of Playwright and the user experience of its users. The optional chaining fix not only prevents the error but also ensures that the correct error message is propagated, which aids in debugging and issue resolution. Playwright's robustness is enhanced by resolving this bug, making it a more dependable tool for web automation and testing.

In conclusion, the wait_for_selector method bug in Playwright when dealing with detached frames highlights the importance of careful error handling in automation libraries. By understanding the root cause and implementing the proposed solution, developers can ensure the reliability of their Playwright scripts. The use of optional chaining provides a clean and effective way to prevent the error and maintain the integrity of the error reporting mechanism.

For more information on Playwright and its features, visit the official Playwright documentation: Playwright Official Documentation