Skip to content

Solver Comparison & Selection Guide

RCWA vs FDTD Solver Comparison

Compare simulated quantum efficiency (QE) curves from RCWA and FDTD solvers. Adjust pixel pitch and solver parameters to see how results and performance change.

RCWA (Fourier order = 9)
0%20%40%60%80%100%400500600700Wavelength (nm)QE (%)RedGreenBlue
FDTD (grid = 20 nm)
0%20%40%60%80%100%400500600700Wavelength (nm)QE (%)RedGreenBlue
RCWA
Time estimate:137 ms
Memory:6 MB
Periodic structures:Yes
Arbitrary geometry:Limited
FDTD
Time estimate:188 ms
Memory:3 MB
Periodic structures:Yes
Arbitrary geometry:Yes
Agreement
Max |Delta QE|:2.2%
Avg |Delta QE|:0.9%
Status:Good agreement

Overview

COMPASS supports multiple electromagnetic solver families for pixel simulation: TMM (Transfer Matrix Method) for fast 1D planar analysis, RCWA (Rigorous Coupled-Wave Analysis) with four implementations (torcwa, grcwa, meent, fmmax) for 2D periodic structures, and FDTD (Finite-Difference Time-Domain) solvers for arbitrary geometries. This guide consolidates benchmark results, cross-solver validation data, and practical selection guidance to help you choose the right solver for your task.

Part 1: RCWA Solver Benchmark

This section compares RCWA solvers (torcwa, grcwa, meent) on the same pixel structure, measuring accuracy and performance.

Goal

Run the same BSI pixel simulation with all available RCWA solvers, then compare QE results and execution times.

Setup

python
import copy
import time
import numpy as np
import matplotlib.pyplot as plt
from compass.runners.single_run import SingleRunner
from compass.analysis.solver_comparison import SolverComparison
from compass.visualization.qe_plot import plot_qe_comparison

pixel_config = {
    "pitch": 1.0,
    "unit_cell": [2, 2],
    "bayer_map": [["R", "G"], ["G", "B"]],
    "layers": {
        "air": {"thickness": 1.0, "material": "air"},
        "microlens": {
            "enabled": True, "height": 0.6,
            "radius_x": 0.48, "radius_y": 0.48,
            "material": "polymer_n1p56",
            "profile": {"type": "superellipse", "n": 2.5, "alpha": 1.0},
            "shift": {"mode": "none"},
        },
        "planarization": {"thickness": 0.3, "material": "sio2"},
        "color_filter": {
            "thickness": 0.6,
            "materials": {"R": "cf_red", "G": "cf_green", "B": "cf_blue"},
            "grid": {"enabled": True, "width": 0.05, "material": "tungsten"},
        },
        "barl": {"layers": [
            {"thickness": 0.010, "material": "sio2"},
            {"thickness": 0.025, "material": "hfo2"},
            {"thickness": 0.015, "material": "sio2"},
            {"thickness": 0.030, "material": "si3n4"},
        ]},
        "silicon": {
            "thickness": 3.0, "material": "silicon",
            "photodiode": {"position": [0, 0, 0.5], "size": [0.7, 0.7, 2.0]},
            "dti": {"enabled": True, "width": 0.1, "material": "sio2"},
        },
    },
}

source_config = {
    "wavelength": {"mode": "sweep", "sweep": {"start": 0.42, "stop": 0.68, "step": 0.02}},
    "polarization": "unpolarized",
}

Run all RCWA solvers

python
solvers = [
    {"name": "torcwa", "type": "rcwa", "params": {"fourier_order": [9, 9]}},
    {"name": "grcwa",  "type": "rcwa", "params": {"fourier_order": [9, 9]}},
    {"name": "meent",  "type": "rcwa", "params": {"fourier_order": [9, 9]}},
]

results = []
labels = []

for solver_cfg in solvers:
    config = {
        "pixel": pixel_config,
        "solver": {
            **solver_cfg,
            "stability": {"precision_strategy": "mixed", "fourier_factorization": "li_inverse"},
        },
        "source": source_config,
        "compute": {"backend": "auto"},
    }

    try:
        result = SingleRunner.run(config)
        results.append(result)
        labels.append(solver_cfg["name"])
        runtime = result.metadata.get("runtime_seconds", 0)
        print(f"{solver_cfg['name']}: {runtime:.2f}s")
    except Exception as e:
        print(f"{solver_cfg['name']}: FAILED - {e}")

Compare QE spectra

python
if len(results) >= 2:
    fig, ax = plt.subplots(figsize=(10, 6))
    plot_qe_comparison(results, labels, ax=ax)
    ax.set_title("RCWA Solver Comparison")
    plt.tight_layout()
    plt.savefig("solver_comparison_qe.png", dpi=150)
    plt.show()

Quantitative comparison

python
if len(results) >= 2:
    comparison = SolverComparison(results, labels)
    summary = comparison.summary()

    print("\n=== Comparison Summary ===")
    print("\nMax QE difference (absolute):")
    for key, val in summary["max_qe_diff"].items():
        print(f"  {key}: {val:.5f}")

    print("\nMax QE relative error (%):")
    for key, val in summary["max_qe_relative_error_pct"].items():
        print(f"  {key}: {val:.2f}%")

    print("\nRuntimes:")
    for solver, runtime in summary["runtimes_seconds"].items():
        print(f"  {solver}: {runtime:.2f}s")

RCWA Convergence Demo

See how increasing the Fourier order N improves the accuracy of permittivity reconstruction and RCWA reflectance/transmittance convergence for a binary grating.

Total Harmonics:11
Matrix Size:11×11 = 121
Computational Cost (O(M³)):~1.3K
Gibbs Overshoot:9.4%
Permittivity Profile Reconstruction
14710120L/2LεPosition in unit cellOriginalFourier (N=5)
R, T Convergence vs Fourier Order
0%10%20%30%40%50%15101520Fourier Order NR (reflectance)T (transmittance)

Fourier Order Approximation Demo

See how increasing the number of Fourier harmonics improves the approximation of a square wave (representing a DTI trench or metal grid cross-section). Notice the Gibbs phenomenon ringing at the edges.

Total Harmonics:11
Matrix Size (RCWA):11×11
Gibbs Overshoot:18.8%
10-10Λ/2ΛPosition within unit cellOriginalFourier (N=5)Gibbs region

Gibbs phenomenon: Even with many harmonics, the Fourier series overshoots by ~9% at discontinuities. In RCWA, this affects convergence at sharp material boundaries (e.g., Si/SiO₂ DTI walls). Li's factorization rules mitigate this for TM polarization.

Fourier order convergence comparison

Compare how each solver converges with Fourier order:

python
orders = [5, 7, 9, 11, 13, 15]
convergence_data = {s["name"]: [] for s in solvers}

for order in orders:
    for solver_cfg in solvers:
        config = {
            "pixel": pixel_config,
            "solver": {
                **solver_cfg,
                "params": {"fourier_order": [order, order]},
                "stability": {"precision_strategy": "mixed"},
            },
            "source": {
                "wavelength": {"mode": "single", "value": 0.55},
                "polarization": "unpolarized",
            },
            "compute": {"backend": "auto"},
        }

        try:
            result = SingleRunner.run(config)
            avg_qe = np.mean([qe[0] for qe in result.qe_per_pixel.values()])
            convergence_data[solver_cfg["name"]].append(avg_qe)
        except:
            convergence_data[solver_cfg["name"]].append(np.nan)

    print(f"Order {order}: done")

# Plot convergence
plt.figure(figsize=(8, 5))
for name, qe_values in convergence_data.items():
    plt.plot(orders, qe_values, "o-", label=name, linewidth=2)

plt.xlabel("Fourier Order")
plt.ylabel("Average QE at 550 nm")
plt.title("Convergence: QE vs Fourier Order")
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("convergence_comparison.png", dpi=150)

Runtime scaling

Measure how runtime scales with Fourier order:

python
orders = [5, 7, 9, 11, 13, 15, 17]
timing_data = {s["name"]: [] for s in solvers}

for order in orders:
    for solver_cfg in solvers:
        config = {
            "pixel": pixel_config,
            "solver": {
                **solver_cfg,
                "params": {"fourier_order": [order, order]},
                "stability": {"precision_strategy": "mixed"},
            },
            "source": {
                "wavelength": {"mode": "single", "value": 0.55},
                "polarization": "TE",  # Single pol for timing
            },
            "compute": {"backend": "auto"},
        }

        try:
            result = SingleRunner.run(config)
            t = result.metadata.get("runtime_seconds", 0)
            timing_data[solver_cfg["name"]].append(t)
        except:
            timing_data[solver_cfg["name"]].append(np.nan)

# Plot
plt.figure(figsize=(8, 5))
for name, times in timing_data.items():
    plt.plot(orders, times, "o-", label=name, linewidth=2)

plt.xlabel("Fourier Order")
plt.ylabel("Runtime (seconds)")
plt.title("Runtime Scaling vs Fourier Order")
plt.legend()
plt.grid(True, alpha=0.3)
plt.yscale("log")
plt.tight_layout()
plt.savefig("runtime_scaling.png", dpi=150)

Expected results

  • QE agreement: All RCWA solvers should agree within 0.5-1% QE at the same Fourier order.
  • Convergence: All solvers converge to the same QE as order increases. Convergence rate may differ slightly.
  • Runtime: torcwa is typically fastest on GPU. grcwa and meent are competitive.
  • FDTD vs RCWA: Should agree within 1-2% QE. FDTD is significantly slower for single-wavelength runs.

Part 2: TMM vs RCWA

Cross-validation between TMM (1D) and RCWA (2D/3D) solvers confirms that different numerical methods produce physically consistent results and helps identify where 1D approximations break down relative to full 2D electromagnetic solutions.

Overview

Three solvers are compared on the same pixel structure:

  • TMM (Transfer Matrix Method): 1D analytical solver that treats each layer as an infinite uniform slab. Ignores lateral patterning (microlens shape, Bayer CF pattern, DTI). Extremely fast (~3 ms for a full sweep).
  • torcwa: PyTorch-based 2D RCWA solver using the S-matrix algorithm. Handles full lateral structure including microlens profile, color filter Bayer pattern, and DTI boundaries.
  • grcwa: NumPy-based 2D RCWA solver with autograd support. Independent RCWA implementation used to cross-check torcwa results.

Simulation Parameters

  • Pixel: 1.0 um pitch BSI, 2x2 RGGB Bayer
  • Stack: 9 layers (air / microlens / planarization / CF / 4-layer BARL / silicon)
  • RCWA: Fourier order [3,3], 5 microlens slices, 64x64 grid
  • Source: Normal incidence, unpolarized, 380-780 nm (20 nm step)
  • Platform: macOS (Apple Silicon), CPU only

Interactive Comparison

Cross-Solver Validation: TMM vs torcwa vs grcwa

Compare absorption, reflection, and transmission spectra from three different electromagnetic solvers. TMM is a fast 1D method, while torcwa and grcwa are full 2D RCWA solvers.

0.00.20.40.60.81.0400450500550600650700750Wavelength (nm)Absorption (A) TMM (1D) - 2.9ms torcwa (2D RCWA) - 15.7s grcwa (2D RCWA) - 0.1s
TMM mean Absorption (A):0.6435
torcwa mean Absorption (A):0.9838
grcwa mean Absorption (A):0.9839
Max |torcwa - grcwa|:0.0114
Show numerical data
Wavelength (nm)TMMtorcwagrcwa|torcwa - grcwa|
3800.90810.98530.98050.0048
4000.87500.97610.98610.0100
4200.82100.99790.98690.0110
4400.85660.99310.98750.0056
4600.79230.99820.98980.0084
4800.71270.98620.98810.0019
5000.39900.98140.98520.0038
5200.07770.97420.98560.0114
5400.05720.98680.98790.0011
5600.35280.99840.98860.0098
5800.63890.99710.98610.0110
6000.71900.98970.98240.0073
6200.73490.98020.97970.0005
6400.75630.96840.97930.0109
6600.74550.97610.98070.0046
6800.69350.97570.98260.0069
7000.65490.97770.98400.0063
7200.65640.98050.98370.0032
7400.68150.98210.98190.0002
7600.69610.98130.97920.0021
7800.68490.97390.97650.0026

Wavelength Sweep Results

Full Data Table

Complete R, T, A values for all 21 wavelengths across the three solvers. Light propagates from air (top) through the stack into silicon (bottom).

λ (nm)TMM RTMM TTMM Atorcwa Rtorcwa Ttorcwa Agrcwa Rgrcwa Tgrcwa A
3800.04010.05170.90810.01470.00000.98530.01950.00000.9805
4000.05290.07220.87500.02390.00000.97610.01390.00000.9861
4200.09610.08300.82100.00210.00000.99790.01310.00000.9869
4400.03480.10850.85660.00690.00000.99310.01250.00000.9875
4600.06330.14430.79230.00180.00000.99820.01020.00000.9898
4800.03240.25490.71270.01380.00000.98620.01190.00000.9881
5000.14610.45490.39900.01860.00000.98140.01480.00000.9852
5200.00880.91350.07770.02580.00000.97420.01440.00000.9856
5400.24050.70230.05720.01310.00010.98680.01210.00000.9879
5600.16970.47750.35280.00130.00030.99840.01140.00000.9886
5800.02800.33310.63890.00210.00080.99710.01390.00000.9861
6000.03870.24230.71900.00770.00260.98970.01760.00000.9824
6200.04770.21750.73490.01370.00610.98020.02020.00000.9797
6400.01800.22580.75630.02230.00930.96840.02060.00010.9793
6600.02090.23360.74550.01260.01130.97610.01930.00010.9807
6800.07810.22840.69350.01160.01260.97570.01730.00010.9826
7000.11990.22520.65490.00930.01300.97770.01590.00010.9840
7200.10910.23450.65640.00550.01400.98050.01610.00020.9837
7400.06640.25220.68150.00320.01470.98210.01780.00030.9819
7600.03720.26670.69610.00450.01420.98130.02050.00040.9792
7800.04300.27210.68490.01160.01450.97390.02310.00040.9765

Key Observations

1D vs 2D Differences

The TMM and RCWA results differ significantly, and understanding why is critical for choosing the right solver for a given task.

1. TMM vs RCWA Absorption

TMM shows absorption ranging from 5% to 91%, while RCWA consistently shows 97-100% absorption across the entire visible spectrum. The 3 um silicon layer absorbs nearly all light that successfully enters it. In TMM (1D), thin-film interference between planar layers creates constructive and destructive patterns that open transmission windows, especially around 500-540 nm where the green color filter becomes transparent. In RCWA (2D), the actual microlens profile focuses light, the Bayer color filter pattern introduces lateral index variation, and DTI boundaries confine light within the pixel. These 2D effects significantly increase the optical path length through silicon and reduce coherent back-reflection.

2. Transmission

TMM predicts substantial transmission (5-91%) through the stack, with a dramatic peak near 520 nm (T=0.91). In contrast, both RCWA solvers show near-zero transmission (T < 0.015 for torcwa, T < 0.0004 for grcwa). The 3 um silicon layer at the Fourier order [3,3] resolution used in RCWA effectively absorbs all power that is not reflected. The S-matrix calculation through thick absorbing layers approaches machine precision limits, and lateral confinement by DTI prevents light from escaping sideways.

3. Reflection

TMM reflectance exhibits strong thin-film oscillations (1-24%), characteristic of coherent interference in a planar multilayer. RCWA reflectance is low and spectrally smooth (0.1-2.6%). The 2D microlens profile acts as a graded-index anti-reflection structure: light incident on a curved surface couples more efficiently into the stack than light hitting a flat interface, reducing specular reflection significantly.

4. torcwa vs grcwa Agreement

Both RCWA solvers agree well with each other, with reflectance differences within 0.5-1.5% absolute and absorption differences within 0.5-2%. This level of agreement between two independent RCWA implementations (PyTorch-based vs NumPy-based) provides strong validation that both solvers are computing the electromagnetic problem correctly. Small residual differences arise from numerical precision in eigendecomposition and S-matrix assembly.

When to Use TMM vs RCWA

  • TMM: Stack design, BARL thickness optimization, fast parameter sweeps (~3 ms per sweep). Best for thin-film interference analysis where lateral patterning is not the primary concern.
  • RCWA: Full 2D diffraction effects, absolute QE prediction, cross-pixel crosstalk analysis, microlens design (~15 s per sweep). Required when lateral structure (microlens, Bayer pattern, DTI) significantly affects the result.

Runtime Comparison

SolverWavelengthsRuntimeSpeedup
TMM212.9 ms5400x
grcwa210.1 s157x
torcwa2115.7 s1x

TMM is approximately 5400x faster than torcwa, making it the preferred solver for iterative design loops. grcwa is 157x faster than torcwa for the same RCWA calculation, benefiting from NumPy's optimized linear algebra on CPU. torcwa's PyTorch backend is optimized for GPU acceleration, so the CPU-only comparison understates its performance on CUDA devices.

Solver Compatibility Notes

meent Numerical Stability

meent 0.12.0 has a known numerical instability for multi-layer 2D structures: R+T > 1 occurs for stacks with 2 or more patterned layers. Single-layer simulations are correct. This is under investigation.

FDTD Material Absorption

The flaport fdtd solver uses per-voxel conductivity-based damping to model material absorption (imaginary permittivity). Combined with two-pass reference normalization, this achieves absorption agreement with RCWA within 3% across the visible spectrum.

Energy Conservation

Energy conservation (R + T + A = 1) is a fundamental physical constraint and serves as a self-consistency check for each solver.

Solvermax |1 - (R+T+A)|Notes
TMM1.11 x 10^-16Machine precision (analytical transfer matrices)
torcwa0.0000S-matrix formulation inherently conserves energy
grcwa0.0000S-matrix formulation inherently conserves energy

All three solvers satisfy energy conservation to machine precision, confirming that the numerical implementations are correct regardless of the physical differences in their modeling assumptions.

Execution Environment

Platform    : macOS (Darwin 25.2.0, Apple Silicon)
Python      : 3.11
PyTorch     : 2.5.0 (torcwa backend)
NumPy       : (grcwa backend)
RCWA Order  : [3, 3] (49 harmonics)
Grid        : 64 x 64
Lens Slices : 5

Part 3: RCWA vs FDTD

This section demonstrates how to validate RCWA (grcwa) results against FDTD (flaport) for a BSI CMOS pixel, and compare direct illumination with cone illumination.

Interactive Chart

RCWA vs FDTD Cross-Solver Validation

Compare absorption, reflection, and transmission spectra from RCWA (grcwa) and FDTD (flaport) solvers, plus cone illumination comparison.

0.970.980.991.00400450500550600650700Wavelength (nm)Absorption (A)grcwa (RCWA)flaport (FDTD)
grcwa mean Absorption (A):0.9848
fdtd mean Absorption (A):0.9829
Max |grcwa - fdtd|:0.0027
Show numerical data
Wavelength (nm)grcwafdtd|grcwa - fdtd|
4000.98340.98070.0027
4200.98370.98210.0016
4400.99150.98970.0018
4600.99020.98850.0017
4800.98380.98250.0013
5000.98350.98180.0017
5200.98860.98670.0019
5400.99150.99020.0013
5600.98810.98640.0017
5800.98120.97970.0015
6000.97620.97450.0017
6200.97580.97350.0023
6400.97920.97700.0022
6600.98400.98210.0019
6800.98760.98540.0022
7000.98820.98600.0022

Why Cross-Solver Validation?

RCWA and FDTD solve Maxwell's equations with fundamentally different approaches:

RCWAFDTD
DomainFrequency domainTime domain
PeriodicityInherently periodicRequires PML boundaries
StrengthsFast for thin-film stacksHandles arbitrary geometry
ConvergenceFourier orderGrid spacing + runtime

When both methods agree on absorption/reflection/transmission spectra, it provides strong confidence in the physical validity of the simulation.

Running the Validation

bash
PYTHONPATH=. python3.11 scripts/validate_rcwa_vs_fdtd.py

This script runs three experiments:

Experiment 1: Normal Incidence Sweep

Compares grcwa (fourier_order=[5,5]) vs fdtd_flaport (dx=0.015um, 500fs, pml=20) across 400-700nm. The FDTD solver uses per-voxel absorption damping and two-pass reference normalization for accurate R/T/A extraction.

Acceptance criterion: max |A_grcwa - A_fdtd| < 5%

Experiment 2: Cone Illumination

Compares three illumination conditions using grcwa:

  • Direct: Normal incidence (θ=0°)
  • Cone F/2.0 CRA=0°: 19-point Fibonacci sampling, cosine weighting
  • Cone F/2.0 CRA=15°: Same cone with 15° chief ray angle

Experiment 3: RCWA Cross-Check

Validates grcwa vs torcwa with identical cone illumination to ensure RCWA solver consistency.

Acceptance criterion: max |A_grcwa - A_torcwa| < 5%

Using ConeIlluminationRunner

python
from compass.runners.cone_runner import ConeIlluminationRunner

config = {
    "pixel": { ... },  # pixel stack config
    "solver": {"name": "grcwa", "type": "rcwa", "params": {"fourier_order": [5, 5]}},
    "source": {
        "wavelength": {"mode": "sweep", "sweep": {"start": 0.40, "stop": 0.70, "step": 0.02}},
        "angle": {"theta_deg": 0.0, "phi_deg": 0.0},
        "polarization": "unpolarized",
        "cone": {
            "cra_deg": 0.0,       # Chief ray angle
            "f_number": 2.0,       # Lens F-number
            "sampling": {"type": "fibonacci", "n_points": 19},
            "weighting": "cosine",
        },
    },
    "compute": {"backend": "cpu"},
}

result = ConeIlluminationRunner.run(config)

The runner:

  1. Generates angular sampling points from ConeIllumination
  2. For each angle: sets up source with (θ, φ) and runs the solver
  3. Accumulates weighted R, T, A and per-pixel QE
  4. Returns a single SimulationResult with the weighted average

Convergence Tips

If RCWA and FDTD results diverge:

  • FDTD: Increase runtime (500→1000fs), decrease grid_spacing (0.015→0.01um), increase pml_layers (20→30)
  • RCWA: Increase fourier_order ([5,5]→[9,9]→[13,13])

Key FDTD accuracy factors:

  1. Per-voxel absorption damping: The flaport library only supports real-valued permittivity on the grid. COMPASS computes equivalent conductivity from the imaginary permittivity and applies per-voxel E-field damping after each timestep: damping = (1 - α) / (1 + α) where α = σ·dt / (2·ε₀·εᵣ).
  2. Two-pass reference normalization: A vacuum reference simulation establishes incident power (P_inc). The structure run's reflection is computed as excess upward flux relative to the reference, eliminating soft-source artifacts.
  3. Sufficient runtime: At least 500 fs is recommended for the CW source to reach steady state in the 3 um silicon structure.

Part 4: Solver Selection Guide

Use the following decision tree to choose the right solver for your simulation:

1. Do you only have uniform (unpatterned) layers?

Yes → Use TMM

  • ~1000x faster than RCWA, ~5000x faster than FDTD
  • Ideal for BARL thickness optimization, stack design, fast parameter sweeps
  • Runtime: ~3 ms for a full wavelength sweep (21 wavelengths)
  • Limitation: Cannot model microlens focusing, Bayer pattern diffraction, or DTI confinement

2. Do you have periodic 2D patterned layers (microlens, color filter grid, DTI)?

Yes → Use RCWA

Choose the specific RCWA solver based on your needs:

SolverBackendKey StrengthsBest For
torcwaPyTorchGPU acceleration, S-matrix stability, TF32 disabled for precisionProduction GPU runs, large Fourier orders
grcwaNumPyAutograd support, fast on CPU, critical for projectInverse design, CPU-only environments, cross-validation
meentPyTorch/JAXMulti-backend flexibilityExperimental, single-layer structures only (see stability note)
fmmaxJAX4 FMM vector formulationsResearch, advanced Fourier factorization studies

3. Do you have non-periodic or complex 3D geometries?

Yes → Use FDTD

SolverBackendKey StrengthsBest For
fdtd_flaportPyTorchSimple API, differentiableQuick FDTD prototyping
fdtdzJAXEfficient z-propagationJAX-based workflows
fdtdxJAXMulti-GPU, differentiableLarge-scale inverse design
meepC++/PythonMature, feature-richComplex geometries, reference solutions

Quick Reference

ScenarioRecommended SolverApproximate Runtime
BARL thickness sweep (1000 configs)TMM~3 seconds total
Single wavelength QE (2D pixel)grcwa or torcwa0.5-15 seconds
Full wavelength sweep (2D pixel)grcwa (CPU) / torcwa (GPU)0.1-16 seconds
Inverse design optimizationgrcwa (autograd)Minutes per iteration
Cross-validation referenceRun TMM + 2 RCWA solversCompare results
Non-periodic structuremeep or fdtdxMinutes to hours