Skip to content

Running RCWA Solvers

COMPASS provides three RCWA solver backends -- torcwa, grcwa, and meent -- all accessible through the same SolverFactory interface. This guide covers solver-specific configuration, stability settings, and convergence verification.

Creating an RCWA solver

All RCWA solvers are created through SolverFactory.create():

python
from compass.solvers.base import SolverFactory

# torcwa (recommended default)
solver = SolverFactory.create("torcwa", solver_config, device="cuda")

# grcwa (NumPy/JAX backend, good for cross-validation)
solver = SolverFactory.create("grcwa", solver_config, device="cpu")

# meent (multi-backend: numpy, jax, torch)
solver = SolverFactory.create("meent", solver_config, device="cpu")

Solver-specific configuration

torcwa

torcwa is the primary RCWA solver. It uses PyTorch for GPU-accelerated matrix operations.

yaml
solver:
  name: "torcwa"
  type: "rcwa"
  params:
    fourier_order: [9, 9]
    dtype: "complex64"
  stability:
    precision_strategy: "mixed"
    allow_tf32: false
    eigendecomp_device: "cpu"
    fourier_factorization: "li_inverse"
    eigenvalue_broadening: 1.0e-10
    energy_check:
      enabled: true
      tolerance: 0.02
      auto_retry_float64: true

Key considerations for torcwa:

  • Always set allow_tf32: false. TF32 reduces mantissa precision from 23 to 10 bits on Ampere+ GPUs, which breaks S-matrix accuracy.
  • The "mixed" precision strategy runs most computations in float32 but promotes eigendecomposition to float64 on CPU. This is the best speed/stability trade-off.
  • Li's inverse rule (fourier_factorization: "li_inverse") is critical for structures with high-contrast boundaries such as tungsten metal grids.

grcwa

grcwa uses NumPy with optional JAX acceleration. It defaults to float64, making it more numerically stable but slower.

yaml
solver:
  name: "grcwa"
  type: "rcwa"
  params:
    fourier_order: [9, 9]
    dtype: "complex128"

grcwa is best used for cross-validation against torcwa. Since it uses a different implementation of the same RCWA algorithm, agreement between the two confirms correctness.

meent

meent supports three backends selectable at runtime:

yaml
solver:
  name: "meent"
  type: "rcwa"
  params:
    fourier_order: [9, 9]
    dtype: "complex64"
    backend: "torch"   # "numpy" | "jax" | "torch"

The COMPASS adapter handles the unit conversion automatically -- meent uses nanometers internally while COMPASS uses micrometers.

Stability settings

PrecisionManager

Before running any RCWA solver, configure precision settings:

python
from compass.solvers.rcwa.stability import PrecisionManager

PrecisionManager.configure(solver_config)

This disables TF32 and sets up the correct precision context. The SingleRunner calls this automatically, but direct solver usage should invoke it manually.

Mixed precision eigendecomposition

The eigenvalue problem is the most numerically sensitive step in RCWA. Use mixed precision to keep speed while avoiding instability:

python
from compass.solvers.rcwa.stability import PrecisionManager
import numpy as np

# NumPy path (grcwa, meent-numpy)
matrix = np.random.randn(722, 722) + 1j * np.random.randn(722, 722)
eigenvalues, eigenvectors = PrecisionManager.mixed_precision_eigen(matrix)

# PyTorch path (torcwa, meent-torch)
eigenvalues, eigenvectors = PrecisionManager.mixed_precision_eigen_torch(matrix_tensor)

Both functions promote the input to float64, perform eigendecomposition, then cast back to the original precision.

Adaptive fallback

For production sweeps where some wavelengths may fail, use the AdaptivePrecisionRunner:

python
from compass.solvers.rcwa.stability import AdaptivePrecisionRunner

runner = AdaptivePrecisionRunner(tolerance=0.02)
result = runner.run_with_fallback(solver, wavelength=0.45, config=solver_config)

The fallback chain is: GPU float32 -> GPU float64 -> CPU float64. If all three fail, the runner raises a RuntimeError suggesting you reduce the Fourier order.

Convergence testing

Always verify that your results have converged before trusting them.

Fourier order sweep

The Fourier order determines the number of harmonics. For order [N, N], the matrix size is (2N+1)2. Sweep N and check that QE stabilizes:

python
import numpy as np
from compass.solvers.base import SolverFactory

orders = range(5, 22, 2)
green_qe_values = []

for N in orders:
    solver_config["params"]["fourier_order"] = [N, N]
    solver = SolverFactory.create("torcwa", solver_config)
    solver.setup_geometry(pixel_stack)
    solver.setup_source({"wavelength": 0.55, "theta": 0.0,
                         "phi": 0.0, "polarization": "unpolarized"})
    result = solver.run()

    green_qe = np.mean([
        qe for name, qe in result.qe_per_pixel.items()
        if name.startswith("G")
    ])
    green_qe_values.append(float(green_qe))
    print(f"Order {N:2d}: Green QE = {green_qe:.4f}")

# Check convergence: relative change between successive orders
for i in range(1, len(green_qe_values)):
    delta = abs(green_qe_values[i] - green_qe_values[i-1])
    converged = "CONVERGED" if delta < 0.005 else ""
    print(f"  Order {list(orders)[i]}: delta = {delta:.5f} {converged}")

Typical convergence: order [9, 9] is sufficient for most 1 um pitch pixels. Structures with fine metal grids or high-contrast layers may need [13, 13] or higher.

RCWA Convergence Demo

See how increasing the Fourier order N improves the accuracy of permittivity reconstruction and RCWA reflectance/transmittance convergence for a binary grating.

Total Harmonics:11
Matrix Size:11×11 = 121
Computational Cost (O(M³)):~1.3K
Gibbs Overshoot:9.4%
Permittivity Profile Reconstruction
14710120L/2LεPosition in unit cellOriginalFourier (N=5)
R, T Convergence vs Fourier Order
0%10%20%30%40%50%15101520Fourier Order NR (reflectance)T (transmittance)

Runtime vs accuracy trade-off

OrderMatrix sizeTypical time (GPU)Use case
[5, 5]1210.1 sQuick screening
[9, 9]3610.3 sStandard production
[13,13]7291.5 sHigh accuracy
[17,17]12255.0 sPublication quality

Fourier Order Approximation Demo

See how increasing the number of Fourier harmonics improves the approximation of a square wave (representing a DTI trench or metal grid cross-section). Notice the Gibbs phenomenon ringing at the edges.

Total Harmonics:11
Matrix Size (RCWA):11×11
Gibbs Overshoot:18.8%
10-10Λ/2ΛPosition within unit cellOriginalFourier (N=5)Gibbs region

Gibbs phenomenon: Even with many harmonics, the Fourier series overshoots by ~9% at discontinuities. In RCWA, this affects convergence at sharp material boundaries (e.g., Si/SiO₂ DTI walls). Li's factorization rules mitigate this for TM polarization.

Running a complete simulation

python
from compass.runners.single_run import SingleRunner

config = {
    "pixel": { ... },  # pixel stack definition
    "solver": {
        "name": "torcwa",
        "type": "rcwa",
        "params": {"fourier_order": [9, 9]},
        "stability": {"precision_strategy": "mixed", "fourier_factorization": "li_inverse"},
    },
    "source": {
        "wavelength": {"mode": "sweep", "sweep": {"start": 0.40, "stop": 0.70, "step": 0.01}},
        "polarization": "unpolarized",
    },
    "compute": {"backend": "auto"},
}

result = SingleRunner.run(config)
print(f"Completed: {len(result.wavelengths)} wavelengths")

Next steps