Skip to content

Conversation

groberts-flex
Copy link
Contributor

@groberts-flex groberts-flex commented Aug 27, 2025

Adds NaN check for vjp data before adjoint simulations get created

Greptile Summary

This PR adds robust NaN detection and error handling to the adjoint optimization pipeline in Tidy3D's autograd system. The core change introduces a validation check in the setup_adj function within tidy3d/web/api/autograd/autograd.py that scans Vector-Jacobian Product (VJP) data for NaN values before adjoint simulations are created.

The implementation strategically places the NaN check after filtering out zero gradient values but before creating adjoint simulations. When NaN values are detected, the system raises an AdjointError with a clear, informative message that helps users identify whether the issue stems from simulation data corruption or problems in their objective function computations.

This change integrates seamlessly with Tidy3D's existing adjoint optimization workflow, which is used extensively for inverse design applications. The adjoint method computes gradients by running backward simulations, and NaN values in the VJP data can cause these backward simulations to fail or produce invalid gradients silently. By catching these issues early, the system prevents downstream failures and provides actionable feedback to users.

The PR includes comprehensive test coverage through a new parameterized test test_vjp_nan that deliberately introduces NaN values into objective functions and verifies proper error handling across different structure and monitor combinations. The change is also properly documented in the changelog, following the project's documentation standards.

Important Files Changed

Files Changed
Filename Score Overview
tidy3d/web/api/autograd/autograd.py 5/5 Added NaN detection check in adjoint pipeline setup to prevent downstream failures
tests/test_components/test_autograd.py 4/5 Added comprehensive test coverage for NaN detection functionality
CHANGELOG.md 5/5 Documented the new NaN validation feature for users

Confidence score: 5/5

  • This PR is safe to merge with minimal risk of breaking existing functionality
  • Score reflects well-designed defensive programming with proper error handling, comprehensive testing, and clear documentation
  • No files require special attention as all changes follow established patterns and best practices

Sequence Diagram

sequenceDiagram
    participant User
    participant AutogradAPI as "tidy3d.web.run"
    participant RunPrimitive as "_run_primitive"
    participant ForwardSim as "Forward Simulation"
    participant PostProcessAdj as "postprocess_adj" 
    participant VJPFunction as "VJP Function"
    participant AdjointSim as "Adjoint Simulation"
    participant ErrorCheck as "NaN Check"

    User->>AutogradAPI: run(simulation, task_name, ...)
    AutogradAPI->>AutogradAPI: is_valid_for_autograd(simulation)
    AutogradAPI->>RunPrimitive: _run_primitive(traced_fields, ...)
    RunPrimitive->>ForwardSim: Execute forward simulation
    ForwardSim-->>RunPrimitive: Return simulation data
    RunPrimitive->>RunPrimitive: postprocess_fwd() - cache forward data
    RunPrimitive-->>AutogradAPI: Return traced field data

    Note over User,AdjointSim: When gradients are computed (backward pass)

    AutogradAPI->>VJPFunction: Execute VJP function
    VJPFunction->>ErrorCheck: Check vjp data for NaN values
    
    alt NaN values detected
        ErrorCheck-->>VJPFunction: Raise AdjointError
        VJPFunction-->>User: "NaN values detected in the adjoint pipeline"
    else No NaN values
        ErrorCheck-->>VJPFunction: Data valid, continue
        VJPFunction->>VJPFunction: setup_adj() - create adjoint simulations
        VJPFunction->>AdjointSim: Run adjoint simulation(s)
        AdjointSim-->>VJPFunction: Return adjoint data
        VJPFunction->>PostProcessAdj: postprocess_adj(adjoint_data, ...)
        PostProcessAdj-->>VJPFunction: Return gradients
        VJPFunction-->>AutogradAPI: Return VJP gradients
    end

    AutogradAPI-->>User: Return simulation data with gradients
Loading

Context used:

Rule - Do not use markdown formatting in exception or warning messages; use single quotes to highlight variable names and code. (link)
Rule - Use changelog categories correctly: "Fixed" for bug fixes, "Changed" for modifications to existing functionality, and "Added" for new features. (link)
Rule - In changelogs, enclose code identifiers (class, function names) in backticks and use specific names rather than generic descriptions. (link)
Rule - Prefer pytest.mark.parametrize over manual for loops to define and run distinct test cases, reducing code duplication. (link)
Rule - Assert the direct outcome of an operation rather than a side effect (like a log message) when possible. (link)

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Bot Settings | Greptile

Copy link
Contributor

Diff Coverage

Diff: origin/develop...HEAD, staged and unstaged changes

  • tidy3d/web/api/autograd/autograd.py (100%)

Summary

  • Total: 3 lines
  • Missing: 0 lines
  • Coverage: 100%

Copy link
Collaborator

@yaugenst-flex yaugenst-flex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @groberts-flex looks great!

@groberts-flex groberts-flex force-pushed the groberts-flex/negative_amplitude_src branch from 35d6c27 to 6da0e4e Compare September 3, 2025 12:10
@groberts-flex groberts-flex force-pushed the groberts-flex/negative_amplitude_src branch 2 times, most recently from 6da0e4e to bcb912a Compare September 3, 2025 20:56
@yaugenst-flex yaugenst-flex added this pull request to the merge queue Sep 4, 2025
Merged via the queue into develop with commit 8b07a65 Sep 4, 2025
24 checks passed
@yaugenst-flex yaugenst-flex deleted the groberts-flex/negative_amplitude_src branch September 4, 2025 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants