ONNX Export Error At 320x192 Resolution In FoundationStereo

Dec 1, 2025 by Alex Johnson 60 views

Introduction

This article delves into a specific issue encountered while exporting an ONNX (Open Neural Network Exchange) model using the FoundationStereo framework. The user reported an error when attempting to export the model with a resolution of 320x192, despite the requirement for dimensions to be multiples of 32. This behavior contrasts with successful exports at larger resolutions, such as 960x544. This article aims to dissect the problem, analyze the provided logs, and offer potential solutions or insights for users facing similar challenges. Understanding the nuances of model export, especially in frameworks like FoundationStereo, is crucial for seamless deployment and integration with various hardware and software platforms.

Background on FoundationStereo and ONNX

FoundationStereo is a project that has garnered attention for its capabilities in stereo vision and depth estimation. Stereo vision, the process of inferring 3D information from two or more images, is a cornerstone of many applications, including robotics, autonomous navigation, and augmented reality. FoundationStereo likely provides a robust set of tools and pre-trained models for tackling these tasks.

ONNX, on the other hand, is an open standard for representing machine learning models. Its primary goal is to facilitate interoperability between different deep learning frameworks. By converting a model to the ONNX format, users can deploy it on a variety of platforms and runtimes, regardless of the framework in which it was initially trained. This flexibility is paramount in real-world scenarios where deployment environments can vary significantly.

Exporting a model to ONNX typically involves tracing the model's execution graph and serializing it into a standardized format. This process can sometimes be sensitive to specific configurations, input shapes, and operations within the model. Errors during export, as highlighted in the user's issue, can stem from various sources, including framework-specific quirks, unsupported operations, or dimension mismatches.

The Reported Issue: 320x192 Resolution Export Failure

The core of the issue lies in the failure to export the FoundationStereo model to ONNX format when the specified resolution is 320x192. The user explicitly mentioned that the dimensions were chosen to comply with the requirement of being multiples of 32. This suggests an underlying problem beyond simple dimension constraints. While larger resolutions like 960x544 worked without issues, the 320x192 configuration triggered an error, indicating a potential edge case or a specific interaction within the model's architecture at this resolution. The provided log, which we will dissect in the following sections, offers crucial clues into the nature of this error.

Analysis of the Error Log

The provided log snippet gives us valuable insights into the sequence of operations leading to the error. Let's break down the key parts:

Argument Configuration:

args:
{'corr_implementation': 'reg', 'corr_levels': 2, 'corr_radius': 4, 'finetune_ckpt_name': 'model_best_bp2.pth', 'finetune_from': None, 'hidden_dims': [128, 128, 128], 'img_gamma': None, 'inference_tile': 0, 'low_memory': 0, 'max_disp': 416, 'max_val_sample': None, 'mixed_precision': True, 'n_downsample': 2, 'n_gru_layers': 3, 'notes': '', 'num_steps': 200000, 'num_worker': 8, 'slow_fast_gru': False, 'tags_more': [], 'tile_min_overlap': [16, 16], 'tile_wtype': 'gaussian', 'time_limit': 14400, 'train_iters': 22, 'val_interval': 1, 'valid_iters': 20, 'vit_size': 'vits', 'wdecay': 0, 'world_size': 32, 'save_path': './pretrained_models/foundation_stereo_new.onnx', 'ckpt_dir': './pretrained_models/11-33-40/model_best_bp2.pth', 'height': 192, 'width': 320}

This section shows the configuration parameters used for the export, confirming the 320x192 resolution (height: 192, width: 320) and the checkpoint being loaded. It also reveals other settings like corr_implementation, mixed_precision, and the path to the pre-trained model.

Loading and Setup:

Using pretrained model from ./pretrained_models/11-33-40/model_best_bp2.pth
Using cache found in /home/kaylor/.cache/torch/hub/facebookresearch_dinov2_main
/home/kaylor/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/swiglu_ffn.py:45: UserWarning: xFormers is disabled (SwiGLU)
warnings.warn(