Numerical Stability
선수 지식 | Prerequisites
RCWA 설명 → FDTD 설명 → 이 페이지 이 페이지는 고급 주제입니다. 처음 사용하시는 분은 건너뛰어도 됩니다. This is an advanced topic. If you're just getting started, feel free to skip this for now.
RCWA simulations can suffer from numerical instability, especially at high Fourier orders, short wavelengths, and with absorbing materials. COMPASS implements a five-layer defense system to address these issues.
Sources of instability
Numerical Precision Comparison: Float32 vs Float64
See how floating-point precision affects phase computation accuracy in wave optics. As phase accumulates over many cycles, float32 errors grow while float64 stays accurate.
Machine epsilon: 1.2e-7
Machine epsilon: 2.2e-16
1. Eigenvalue decomposition errors
The core of RCWA is solving a
- Loss of orthogonality in eigenvectors
- Incorrect eigenvalue ordering
- Near-degenerate eigenvalue pairs mixing
2. Exponential overflow in T-matrix
The traditional transfer-matrix (T-matrix) approach propagates fields through layers using:
For evanescent modes,
3. TF32 on NVIDIA GPUs
NVIDIA Ampere and newer GPUs use TF32 (TensorFloat-32) by default for float32 matrix multiplications. TF32 has only 10 mantissa bits (vs. 23 for float32), which destroys RCWA accuracy silently. COMPASS disables TF32 by default.
4. Fourier factorization at discontinuities
At sharp material boundaries (metal grids, DTI), the standard Fourier representation of permittivity converges slowly and can produce unphysical results (Gibbs phenomenon). This primarily affects TM polarization.
The five-layer defense
COMPASS addresses each instability source with a dedicated module in compass.solvers.rcwa.stability:
Layer 1: PrecisionManager
Controls floating-point precision at every stage:
solver:
stability:
precision_strategy: "mixed" # float32 for most, float64 for eigendecomp
allow_tf32: false # CRITICAL: disable TF32
eigendecomp_device: "cpu" # CPU eigendecomp is more stable than GPUThe mixed strategy keeps most computation in float32 for speed but promotes the eigendecomposition to float64:
# Internally, COMPASS does:
matrix_f64 = matrix.astype(np.complex128)
eigenvalues, eigenvectors = np.linalg.eig(matrix_f64)
# Then converts back to original precisionLayer 2: S-Matrix Algorithm
COMPASS uses the Redheffer star product instead of the T-matrix. The star product combines S-matrices layer by layer using only bounded quantities:
The key property: all intermediate exponentials are of the form
Layer 3: Li Factorization
For structures with sharp dielectric boundaries, COMPASS applies Li's inverse rule:
instead of the naive
Layer 4: Eigenvalue Stabilizer
Post-processing of eigenvalues to handle:
- Degenerate eigenvalues: When two eigenvalues are closer than a threshold (
eigenvalue_broadening: 1e-10), their eigenvectors are orthogonalized via Gram-Schmidt. - Branch selection: The square root of eigenvalues must choose the correct sign. COMPASS enforces
for propagating modes and the correct decay direction for evanescent modes.
Layer 5: Adaptive Precision Runner
If the energy balance check fails (
float32 (GPU) ---> float64 (GPU) ---> float64 (CPU)
fast slower slowest but most stableThis is configured via:
solver:
stability:
energy_check:
enabled: true
tolerance: 0.02
auto_retry_float64: trueDiagnosing stability issues
Pre-simulation checks
The StabilityDiagnostics.pre_simulation_check method warns about:
- Large matrices with float32 (risk: eigendecomp failure)
- Thick layers without S-matrix (risk: T-matrix overflow)
- TF32 enabled (risk: silent accuracy loss)
- Patterned layers with naive factorization (risk: slow convergence)
Post-simulation checks
After simulation, StabilityDiagnostics.post_simulation_check validates:
- QE in range
for all pixels - No NaN or Inf in results
- Energy conservation:
Warning signs
| Symptom | Likely cause | Fix |
|---|---|---|
| QE > 100% or < 0% | Eigendecomp failure | Use precision_strategy: "float64" |
| Energy balance violation | T-matrix overflow | Ensure S-matrix algorithm is used |
| Noisy QE at short wavelengths | Float32 insufficient | Enable auto_retry_float64 |
| Slow TM convergence | Naive factorization | Switch to li_inverse |
| Condition number warning | Nearly singular matrix | Reduce Fourier order or increase broadening |
Recommended settings
For production simulations:
solver:
stability:
precision_strategy: "mixed"
allow_tf32: false
eigendecomp_device: "cpu"
fourier_factorization: "li_inverse"
energy_check:
enabled: true
tolerance: 0.02
auto_retry_float64: true
eigenvalue_broadening: 1.0e-10
condition_number_warning: 1.0e+12For maximum accuracy (at the cost of speed):
solver:
stability:
precision_strategy: "float64"
allow_tf32: false
eigendecomp_device: "cpu"
fourier_factorization: "li_inverse"