Skip to content

Choosing a Solver

COMPASS supports multiple electromagnetic solver backends under a unified interface. This guide covers the available solvers, when to use each one, how to configure them, and how to validate convergence.

Available solvers

RCWA solvers

RCWA (Rigorous Coupled-Wave Analysis) is a frequency-domain method that solves Maxwell's equations for periodic structures by expanding fields and permittivity into Fourier harmonics. It is naturally suited to pixel arrays because pixels repeat periodically.

NameLibraryBackendGPUStatus
torcwatorcwaPyTorchCUDAPrimary, recommended
grcwagrcwaNumPy/JAXCUDACross-validation
meentmeentNumPy/JAX/PyTorchCUDA/CPUMulti-backend, good for validation

FDTD solvers

FDTD (Finite-Difference Time-Domain) solves Maxwell's equations by stepping through time on a spatial grid. It is broadband in a single run and handles arbitrary (non-periodic) geometries.

NameLibraryBackendGPUStatus
fdtd_flaportfdtdPyTorchCUDAPrototyping

RCWA vs FDTD: when to use each

Trade-off summary

CriterionRCWAFDTD
Speed (single wl)Fast (0.1--2 s)Slow (30--120 s)
Speed (31-pt sweep)~10 s (sequential)~45 s (one broadband run)
MemoryModerate (200--800 MB)High (1--4 GB)
Periodic structuresNative (Bloch BCs)Requires periodic BCs
Curved surfacesStaircase approximationStaircase approximation
Dispersive metalsNatural (per-wavelength)Requires Drude/Lorentz fit
Near-field accuracyGood in far-fieldGood everywhere
Convergence parameterFourier order NGrid spacing Δx
Numerical stabilityS-matrix requiredCFL condition

Compare solver performance and accuracy trade-offs interactively:

RCWA vs FDTD Solver Comparison

Compare simulated quantum efficiency (QE) curves from RCWA and FDTD solvers. Adjust pixel pitch and solver parameters to see how results and performance change.

RCWA (Fourier order = 9)
0%20%40%60%80%100%400500600700Wavelength (nm)QE (%)RedGreenBlue
FDTD (grid = 20 nm)
0%20%40%60%80%100%400500600700Wavelength (nm)QE (%)RedGreenBlue
RCWA
Time estimate:137 ms
Memory:6 MB
Periodic structures:Yes
Arbitrary geometry:Limited
FDTD
Time estimate:188 ms
Memory:3 MB
Periodic structures:Yes
Arbitrary geometry:Yes
Agreement
Max |Delta QE|:2.2%
Avg |Delta QE|:0.9%
Status:Good agreement

When RCWA excels

  • Standard Bayer pixel QE sweeps (the primary COMPASS use case)
  • Angular response studies (just change θ, ϕ)
  • Fourier order convergence studies
  • Parameter sweeps where each point is independent

When FDTD is preferable

  • Non-periodic or finite-sized structures
  • Broadband response in a single simulation
  • Situations where near-field detail inside the structure matters
  • Cross-validation against RCWA

The following diagram shows how data flows through the solver pipeline, from configuration to simulation results:

Solver Pipeline: Abstract Methods

Click on each step to see the method signature and data types. Toggle between RCWA and FDTD paths.

setup_geometry(pixel_stack)
calls pixel_stack.get_layer_slices()
setup_source(source_config)
2D eps grids per layer
run()
SimulationResult
get_field_distribution(...)
component, plane, position
SolverFactory.create(name, config, device)
torcwa
RCWA
grcwa
RCWA
meent
RCWA
fdtd_flaport
FDTD

Solver configuration reference

yaml
solver:
  name: "torcwa"
  type: "rcwa"
  params:
    fourier_order: [9, 9]       # Fourier harmonics [Nx, Ny]
    dtype: "complex64"           # complex64 or complex128
  stability:
    precision_strategy: "mixed"  # mixed | float32 | float64
    allow_tf32: false            # MUST be false for RCWA
    eigendecomp_device: "cpu"    # cpu | gpu (cpu is more stable)
    fourier_factorization: "li_inverse"  # li_inverse | naive
    energy_check:
      enabled: true
      tolerance: 0.02
      auto_retry_float64: true
    eigenvalue_broadening: 1.0e-10
    condition_number_warning: 1.0e+12
  convergence:
    auto_converge: false
    order_range: [5, 25]
    qe_tolerance: 0.01

Key parameter: fourier_order

The Fourier order N determines the number of harmonics retained in each direction. The total number of harmonics is (2Nx+1)×(2Ny+1). For [9, 9], this gives 19×19=361 modes. Eigenvalue problems of size 2×361=722 are solved per layer.

OrderModesMatrix sizeTypical runtimeAccuracy
[5, 5]121242x2420.1 sLow
[9, 9]361722x7220.3 sGood
[13,13]7291458x14581.5 sHigh
[17,17]12252450x24505.0 sVery high

Key parameter: precision_strategy

Explore how different precision strategies affect numerical accuracy and runtime:

Numerical Precision Comparison: Float32 vs Float64

See how floating-point precision affects phase computation accuracy in wave optics. As phase accumulates over many cycles, float32 errors grow while float64 stays accurate.

Float32
cos(phase):1.0000105
sin(phase):0.0000105
Error magnitude:1.50e-5
~7 significant digits
Machine epsilon: 1.2e-7
Float64
cos(phase):1.000000000000000
sin(phase):-0.000000000000005
Error magnitude:2.79e-14
~16 significant digits
Machine epsilon: 2.2e-16
1e-161e-131e-101e-71e-41e-11510204060801000.5% QE tolerancePhase (multiples of 2pi)Absolute errorFloat32Float64

The "mixed" strategy (default) runs the main simulation in complex64 but promotes eigendecomposition to complex128 and executes it on CPU. This provides a good balance of speed and stability.

mixed={float32 for layer setup, FFT, S-matrix productsfloat64 for eigendecomp (on CPU)

grcwa

yaml
solver:
  name: "grcwa"
  type: "rcwa"
  params:
    fourier_order: [9, 9]
    dtype: "complex128"     # grcwa defaults to float64
  convergence:
    auto_converge: false
    order_range: [5, 25]
    qe_tolerance: 0.01

grcwa uses NumPy-based computation with optional JAX acceleration. It defaults to complex128 and tends to be numerically more stable than torcwa at the cost of speed. It is useful for cross-validation.

meent

yaml
solver:
  name: "meent"
  type: "rcwa"
  params:
    fourier_order: [9, 9]
    dtype: "complex64"
    backend: "torch"       # numpy | jax | torch
  convergence:
    auto_converge: false
    order_range: [5, 25]
    qe_tolerance: 0.01

meent supports three backends: numpy (backend=0), jax (backend=1), and torch (backend=2). The JAX backend can leverage XLA compilation for performance. Note that meent uses nanometers internally; the COMPASS adapter handles the conversion from micrometers.

fdtd_flaport

yaml
solver:
  name: "fdtd_flaport"
  type: "fdtd"
  params:
    grid_spacing: 0.02     # Grid cell size in um (20 nm)
    runtime: 200           # Simulation time in femtoseconds
    pml_layers: 15         # PML absorber thickness in cells
    dtype: "float64"

Key parameter: grid_spacing

The grid must resolve both the smallest geometric feature and the shortest wavelength inside the highest-index material. For silicon (n4) at 400 nm:

ΔxλminnmaxPPW=0.4004.0×10=0.010 um

where PPW (points per wavelength) should be at least 10 for accuracy. A 20 nm grid is adequate for most visible-range simulations, but 10 nm provides better accuracy at the cost of 8x more memory (3D).

GPU vs CPU considerations

CUDA (NVIDIA GPUs)

RCWA benefits substantially from GPU acceleration for the matrix operations (FFT, eigendecomp, S-matrix products). Typical speedup is 5--20x for Fourier orders above [7, 7].

yaml
compute:
  backend: "cuda"
  gpu_id: 0

Important: Disable TF32 on Ampere+ GPUs (RTX 30xx/40xx, A100). TF32 reduces floating-point mantissa from 23 bits to 10 bits in matmul operations, which catastrophically degrades S-matrix accuracy.

yaml
solver:
  stability:
    allow_tf32: false  # Always keep this false for RCWA

Apple Silicon (MPS)

PyTorch MPS backend works for basic tensor operations but has limitations for RCWA:

  • Complex number support is incomplete in some PyTorch versions
  • Eigendecomposition may silently fall back to CPU
  • Performance is typically slower than CUDA for RCWA workloads
yaml
compute:
  backend: "mps"

Test with CPU first if you encounter MPS errors.

CPU

All solvers work on CPU without any GPU dependencies. CPU mode is slower but fully functional and numerically the most reliable.

yaml
compute:
  backend: "cpu"
  num_workers: 4

Convergence testing

RCWA: Fourier order sweep

Always verify that results have converged before trusting them. Sweep the Fourier order and check that QE stabilizes.

python
import numpy as np
from compass.solvers.base import SolverFactory

orders = range(5, 22, 2)
peak_green_qe = []

for N in orders:
    config["solver"]["params"]["fourier_order"] = [N, N]
    solver = SolverFactory.create("torcwa", config["solver"])
    solver.setup_geometry(pixel_stack)
    solver.setup_source({"wavelength": 0.55, "theta": 0.0,
                         "phi": 0.0, "polarization": "unpolarized"})
    result = solver.run()

    green_qe = np.mean([
        qe for name, qe in result.qe_per_pixel.items()
        if name.startswith("G")
    ])
    peak_green_qe.append(float(green_qe))
    print(f"Order {N:2d}: Green QE = {green_qe:.4f}")

# Check convergence: relative change < 1%
for i in range(1, len(peak_green_qe)):
    delta = abs(peak_green_qe[i] - peak_green_qe[i-1])
    print(f"  Order {list(orders)[i]}: delta = {delta:.5f}")

FDTD: Grid spacing convergence

python
for spacing in [0.04, 0.02, 0.01]:
    config["solver"]["params"]["grid_spacing"] = spacing
    # ... run and compare QE

Cross-solver validation

Run the same pixel with two solvers and compare QE spectra:

python
from compass.visualization.qe_plot import plot_qe_comparison

ax_main, ax_diff = plot_qe_comparison(
    results=[torcwa_result, grcwa_result],
    labels=["torcwa", "grcwa"],
    show_difference=True,
    figsize=(10, 7),
)

Agreement within 1--2% absolute QE is expected for well-converged simulations at the same Fourier order.

Performance benchmarks

Typical numbers for a 2x2 Bayer unit cell, 1.0 um pitch, single wavelength, normal incidence, NVIDIA RTX 4090:

SolverOrder / GridRuntimeGPU MemNotes
torcwa (f32)[9, 9]0.3 s200 MBDefault, fast
torcwa (f64)[9, 9]0.6 s400 MBHigher accuracy
torcwa (f32)[15, 15]2.1 s600 MBHigh accuracy
grcwa (f64)[9, 9]0.5 s250 MBCross-validation
meent/torch (f32)[9, 9]0.4 s200 MBComparable
fdtd_flaport (20nm)--45 s2 GBBroadband capable
fdtd_flaport (10nm)--180 s8 GBHigh accuracy

For a 31-point wavelength sweep (400--700 nm, 10 nm step):

  • RCWA (torcwa, order 9): 31 x 0.3 s = ~10 s total
  • FDTD (flaport, 20 nm): 1 broadband run = ~45 s

RCWA wins for narrow sweeps; FDTD becomes competitive for very dense wavelength sampling (>100 points).

Quick-start recommendations

Use caseRecommended solverConfig notes
Standard QE simulationtorcwaorder [9,9], mixed precision
Publication-quality resultstorcwaorder [15,15], float64
Cross-validationtorcwa + grcwaSame order, compare QE
Metal grid / high-contrast layerstorcwa or meentLi inverse factorization
Broadband single-shotfdtd_flaport10 nm grid, 200 fs runtime
CPU-only environmentmeent (numpy)No GPU required

Next steps