Reproducibility Check Failed: `dev-MC_100km_jra_ryf+wombatlite`

Nov 30, 2025 by Alex Johnson 64 views

Reproducibility Check Failure: `dev-MC_100km_jra_ryf+wombatlite`

We've encountered a failure in the scheduled reproducibility check for the configuration dev-MC_100km_jra_ryf+wombatlite within the ACCESS-NRI/access-om3-configs repository. This article details the specifics of the failure, the resources for investigation, and the next steps for resolution.

Understanding Reproducibility Checks

Reproducibility in scientific computing, especially in climate modeling, is the cornerstone of reliable research. A reproducible experiment ensures that, given the same inputs and environment, the same results can be obtained consistently. Scheduled reproducibility checks are automated processes designed to verify this consistency in our model configurations.

When these checks fail, it indicates a discrepancy between expected and actual results. This could stem from various factors, such as changes in the code, environment, or input data. Identifying the root cause is crucial for maintaining the integrity and reliability of our models.

Reproducibility checks are not just about ensuring the same numerical outputs; they are about validating the entire scientific workflow. This includes the model setup, execution, and post-processing. A failure in any of these steps can compromise the reproducibility of the results.

For the ACCESS-NRI team, these checks are a critical part of our continuous integration and continuous deployment (CI/CD) pipeline. They help us catch issues early in the development process, preventing them from propagating into production systems. By addressing these failures promptly, we can maintain the high standards of our modeling work.

Details of the Failure

The failure occurred during a scheduled reproducibility check on the ACCESS-NRI/access-om3-configs repository. The specific configuration that failed is dev-MC_100km_jra_ryf+wombatlite. This configuration is a development branch, indicating it's a work-in-progress and potentially subject to changes.

The key components involved in this failure are:

Model: access-om3 (https://github.com/ACCESS-NRI/access-om3)
- The access-om3 model is a crucial part of the ACCESS-NRI suite, focusing on ocean and sea-ice modeling. It's designed to simulate the complex interactions within the ocean and between the ocean and atmosphere. Understanding its behavior is vital for climate predictions and research.
- The model's development involves continuous updates and modifications, making reproducibility checks essential. These checks ensure that new features and optimizations do not introduce unintended side effects or inconsistencies in the model's output.
- The GitHub repository serves as the central hub for the model's codebase, documentation, and issue tracking. This transparency allows for collaborative development and rigorous testing. The model's complexity necessitates a robust testing framework, including reproducibility checks, to maintain its reliability.
Config Repo: access-om3-configs (https://github.com/ACCESS-NRI/access-om3-configs)
- The access-om3-configs repository holds the various configurations used to run the access-om3 model. These configurations define the model's parameters, input data, and runtime settings. Each configuration represents a specific experimental setup, tailored to address different research questions.
- The repository's structure allows for easy management and version control of these configurations. This is crucial for maintaining a consistent and reproducible modeling environment. The configurations are organized into branches and directories, reflecting the different development stages and experimental designs.
- The configurations repository is as important as the model code itself. It ensures that experiments can be replicated and validated by other researchers. Reproducibility checks within this repository verify that the configurations produce consistent results across different runs and environments.
Config Ref Tested: dev-MC_100km_jra_ryf+wombatlite (https://github.com/ACCESS-NRI/access-om3-configs/tree/dev-MC_100km_jra_ryf+wombatlite)
- The dev-MC_100km_jra_ryf+wombatlite configuration is a specific development branch within the access-om3-configs repository. It represents a particular setup of the access-om3 model, potentially with new features or modifications under testing.
- The branch name indicates that it is a development version, suggesting that it may be subject to frequent changes. This makes reproducibility checks even more critical, as they help identify any inconsistencies introduced during development.
- The configuration details, such as the grid resolution (100km) and forcing datasets (jra_ryf), provide insights into the experimental design. Understanding these details is crucial for diagnosing the cause of the reproducibility failure.
Failed Run Log: https://github.com/ACCESS-NRI/access-om3-configs/actions/runs/19805067703
- The failed run log is the primary source of information for diagnosing the reproducibility failure. It contains detailed information about the model execution, including any error messages, warnings, and performance metrics.
- Analyzing the log requires a systematic approach, focusing on identifying the point of divergence between the expected and actual results. This may involve examining the model's output at different time steps and comparing it to the checksums.
- The log also provides valuable context, such as the environment in which the model was run and the specific versions of the software and libraries used. This information is crucial for reproducing the failure and identifying its root cause.
Experiment Location (Gadi): /scratch/tm70/repro-ci/experiments/access-om3-configs/dev-MC_100km_jra_ryf+wombatlite
- The experiment location on Gadi, the Australian supercomputing facility, provides access to the model's output data and runtime environment. This allows for a detailed examination of the model's behavior and the factors that may have contributed to the failure.
- The directory structure within the experiment location typically includes input data, output files, and log files. These resources are essential for reproducing the failure and identifying its cause.
- The Gadi environment is carefully managed to ensure consistency and reliability. However, subtle differences in the environment, such as library versions or system configurations, can sometimes impact the reproducibility of model runs.
Checksums:
- Created in the testing/checksum directory
  - Checksums play a vital role in reproducibility checks. They are calculated from the model's output data and serve as a fingerprint of the results. Comparing checksums from different runs provides a quantitative measure of reproducibility.
  - The testing/checksum directory typically contains a set of checksum files, each corresponding to a specific output variable or time step. These files are generated during a successful run and serve as the baseline for comparison.
  - The checksum calculation method is carefully chosen to be sensitive to small differences in the model's output. This ensures that even minor inconsistencies are detected and flagged as potential reproducibility issues.
- Compared against: https://github.com/ACCESS-NRI/access-om3-configs/tree/dev-MC_100km_jra_ryf+wombatlite/testing/checksum
  - The checksums in the testing/checksum directory of the dev-MC_100km_jra_ryf+wombatlite branch serve as the reference point for the reproducibility check. The model's output from a new run is compared against these checksums to detect any discrepancies.
  - The comparison process involves calculating checksums from the new run's output and comparing them to the reference checksums. If any checksums differ significantly, the reproducibility check is considered to have failed.
  - The use of checksums ensures that the reproducibility check is objective and quantitative. It eliminates the need for manual inspection of the model's output, which can be time-consuming and prone to human error.

Investigating the Failure

To effectively investigate this failure, a systematic approach is essential. Here’s a breakdown of the recommended steps:

Examine the Failed Run Log: The failed run log is the first point of investigation. It often contains error messages or warnings that provide clues about the cause of the failure. Look for any exceptions, segmentation faults, or unexpected behavior. The log may also indicate resource limitations, such as memory or disk space issues, that could have contributed to the failure.
Compare Checksums: Identify which checksums failed the comparison. This will pinpoint the specific output variables or time steps where the model's behavior diverged from the expected results. A detailed comparison of the failed checksums can reveal patterns or trends that provide further insights into the nature of the failure. For example, a gradual increase in the difference between checksums over time might indicate a drift in the model's solution.
Review Code Changes: Since dev-MC_100km_jra_ryf+wombatlite is a development branch, recent code changes are a likely source of the failure. Review the commit history for this branch to identify any modifications that might have introduced inconsistencies. Pay close attention to changes related to the affected output variables or the model's core algorithms. It's also important to consider changes in dependencies, such as libraries or compilers, that could have unintended side effects.
Check Environment Differences: The computing environment can significantly impact the reproducibility of model runs. Verify that the environment used for the failed run is identical to the environment used to generate the reference checksums. This includes the operating system, compiler versions, libraries, and any other relevant software. Subtle differences in the environment can lead to numerical variations that cause checksum mismatches. Containerization technologies, such as Docker, can help ensure consistent environments across different runs.
Reproduce the Failure: Attempt to reproduce the failure locally or in a controlled environment. This will allow for more detailed debugging and analysis. Use the same input data, configuration files, and environment settings as the failed run. If the failure can be reproduced, it's much easier to experiment with different fixes and identify the root cause.
Consult with Experts: If the cause of the failure is not immediately apparent, consult with experts in the relevant areas of the model or the computing environment. They may have insights or suggestions that can help narrow down the possibilities. Collaboration and knowledge sharing are essential for resolving complex reproducibility issues.

Next Steps and Resolution

Once the cause of the failure has been identified, the next step is to implement a solution. This might involve fixing a bug in the code, adjusting the configuration, or modifying the environment. The specific steps will depend on the nature of the failure.

After implementing a fix, it's crucial to re-run the reproducibility check to verify that the issue has been resolved. This will ensure that the model's behavior is consistent and reliable. If the check passes, the fix can be merged into the main branch or deployed to production.

In some cases, the failure might be due to a transient issue or a non-deterministic behavior in the model. In these situations, it might be necessary to run the reproducibility check multiple times to confirm that the failure is not a fluke. Statistical methods can be used to assess the significance of the differences between checksums and determine whether they are within acceptable limits.

To ensure the long-term reproducibility of our models, it's essential to establish robust testing and validation procedures. This includes not only scheduled reproducibility checks but also other forms of testing, such as unit tests and integration tests. By continuously monitoring the model's behavior and addressing any issues promptly, we can maintain the integrity and reliability of our scientific results.

Tagging @ACCESS-NRI/model-release to bring this issue to their attention and initiate the resolution process.

Conclusion

The scheduled reproducibility check failure for the dev-MC_100km_jra_ryf+wombatlite configuration highlights the importance of rigorous testing in scientific computing. By systematically investigating the failure and implementing a solution, we can ensure the reliability and integrity of our models. This not only strengthens our research but also builds confidence in the predictions and insights derived from these models. Remember, the goal is not just to fix the immediate issue but also to learn from it and improve our processes for the future.

For more information on best practices in scientific computing and reproducibility, consider exploring resources from organizations like the Software Sustainability Institute. They offer valuable guidance and tools for ensuring the quality and reliability of research software.