Skip to content

Comprehensive Survey of Open-Source RCWA / FDTD Optical Solvers

Survey date: 2026-02-11 | For COMPASS project reference


1. RCWA Solvers

1.1 Solvers Already Integrated in COMPASS

SolverLanguageGPUADLicenseStatusNotes
torcwaPython/PyTorchCUDA✅ PyTorchLGPL~171⚠️ Low activity (2023~)S-matrix, suitable for metasurface inverse design. Development stalled
grcwaPython/autograd✅ autogradGPL~94❌ Discontinued (2020~)CPU only. Topology optimization. Effectively abandoned
meentPython (NumPy/JAX/PyTorch)JAX/PyTorch✅ JAX/PyTorchMIT~112✅ Active3 backends supported, best for ML integration. Developed by Korean company (KC-ML2)

Assessment: meent is the best in terms of licensing (MIT), maintenance, and flexibility. grcwa needs reassessment for long-term integration viability.

1.2 Integration Candidates (High Priority)

SolverLanguageGPUADLicenseStatusKey Strengths
fmmaxPython/JAXJAXMIT~137✅ Active (Meta)State-of-the-art vector FMM (Li inverse rule extension), Brillouin zone integration, batching support. AR/VR optics
torchrditPython/PyTorchCUDAGPL-3.0~11✅ ActiveNo eigenvalue decomposition required (R-DIT), 16.2x speedup over conventional RCWA, GDS import/export
S4 (phoebe-p fork)C++/Lua/PythonGPL-2.0~166⚠️ Fork activeRCWA reference implementation, Li inverse rule, Solcore/RayFlare integration. 25-year history

fmmax details:

  • Developed by Meta Reality Labs for AR/VR diffractive optics design
  • 4 vector FMM formulations (Pol, Normal, Jones, Jones-direct) → best convergence
  • JAX JIT compilation + batching for simultaneous computation of multiple configurations
  • Anisotropic and magnetic material support

torchrdit details:

  • R-DIT (Rigorous Diffraction Interface Theory) eliminates eigenvalue decomposition, the traditional RCWA bottleneck
  • PyTorch-based → similar interface to existing torcwa code possible
  • Fast Fourier Factorization (POL, NORMAL, JONES, JONES_DIRECT)
  • Automatic permittivity fitting for dispersive materials

1.3 Special Purpose / Reference

SolverLanguageGPULicenseKey Features
inkstonePython/NumPyAGPL-3.0~63Tensor permittivity/permeability (anisotropic, magneto-optic, gyromagnetic). Partial recomputation optimization
nannosPythonGPL-3.0~20Multiple FMM formulations, AD, GPU acceleration. GitLab hosted
rcwa_tfPython/TensorFlowCUDABSD-3~51TF-based, Lorentzian broadening (gradient stabilization), batch optimization
rcwa (edmundsj)PythonMIT~134Built-in refractiveindex.info DB, TMM+RCWA, ellipsometry
EMUstackFortran/PythonGPL-3.0~28Hybrid 2D-FEM + scattering matrix. Strong for metallic/plasmonic structures
MESHC++❌(MPI)GPL-3.0~33RCWA + thermal radiation transfer (near-field/far-field). Stanford Fan Group
RETICOLOMATLABFreewareN/AIndustry standard (Zeiss, Intel, Apple, Samsung). V10 (2025.01) anisotropy support. 25-year history
rcwa4dPythonMIT~40Incommensurate periodicity (twisted bilayer/moiré structures). Stanford Fan Group
RCWA.jlJuliaCUDAGPL-3.0~46S-matrix + ETM, eigenvalue-free algorithm, CUDA 5x speedup
EMpyPythonMIT~219TMM + RCWA + mode solver. Most GitHub stars

2. FDTD Solvers

2.1 Solvers Already Integrated in COMPASS

SolverLanguageGPUADLicenseStatusNotes
MeepC++/Python✅ (adjoint)GPL-2.0+~1,500✅ ActiveMost mature open-source FDTD. MPI parallelization. Lack of GPU support is a drawback
flaport/fdtdPython/PyTorchCUDA✅ PyTorchMIT~650⚠️ Low activityFor education/prototyping. Simple API. Lacks advanced material models
fdtdzC++/CUDA/JAXCUDA✅ JAXMIT~146✅ Active (Google)~100x speed vs Meep. Limited to 2.5D. Only simple dielectrics supported

Assessment: Meep = general-purpose reference, fdtdz = speed-focused (2.5D), flaport = educational. Lacking a general-purpose 3D GPU FDTD.

2.2 Integration Candidates (High Priority)

SolverLanguageGPUADLicenseStatusKey Strengths
FDTDXPython/JAXJAX (multi-GPU)MIT~203✅ Very activeOptimal for large-scale 3D inverse design. Multi-GPU. Memory-efficient gradients using Maxwell time-reversal. JOSS published
Khronos.jlJuliaCUDA/ROCm/Metal/OneAPIMIT~66✅ Active (Meta)Multi-vendor GPU (NVIDIA+AMD+Apple+Intel). Pure Julia. Differentiable
cevichePython/autogradMIT~390⚠️ Low activityPioneer of differentiable EM. 2D FDFD+FDTD. Stanford Fan Group

FDTDX details:

  • JAX-based fully differentiable 3D FDTD
  • Multi-GPU scaling → billions of grid cells simulation possible
  • Memory-efficient gradients leveraging time-reversibility of Maxwell's equations
  • Published in JOSS (Journal of Open Source Software) (2025)
  • MIT license + active development → favorable for long-term integration

Khronos.jl details:

  • Developed by Meta Research
  • Only solver with multi-vendor GPU support (CUDA, ROCm, Metal, OneAPI)
  • Vendor-independent GPU code based on KernelAbstractions.jl
  • DFT monitors + convergence-based automatic termination
  • Julia ecosystem dependency is a drawback

2.3 Special Purpose / Reference

SolverLanguageGPULicenseKey Features
openEMSC++/MATLAB/Python❌(OpenMP/MPI)GPL-3.0~628EC-FDTD, cylindrical coordinates, RF/antenna/microwave specialized
gprMaxPython/CythonCUDAGPL-3.0~788GPR (Ground Penetrating Radar) specialized, CUDA 30x speedup. Soil models
EMOPTPython/C❌(MPI)BSD-3~110FDFD (2D/3D) + CW-FDTD, shape optimization (boundary smoothing), adjoint method
fdtd3dC++CUDA/MPIGPL-3.0~150MPI+OpenMP+CUDA, cross-architecture (x64/ARM/RISC-V/Wasm)
GSvitC/C++CUDAGPL-2.0N/ANanoscale optics (SNOM, roughness), GPU accelerated
Luminescent.jlJuliaCUDAMIT~60Differentiable FDTD (Zygote.jl), semiconductor photonics+acoustics+RF
AngoraC++GPLSmallBiomedical scattering specialized

2.4 Commercial / Non-Open-Source (Reference)

SolverNotes
Tidy3D (Flexcompute)Python client is LGPL but computation engine is commercial cloud. Cannot run locally. Very fast
Lumerical (Ansys)Fully commercial. Industry standard GUI. CUDA GPU solver. Academic discounts available

3. Technology Trend Analysis

3.1 Differentiable EM Simulation (Differentiable EM)

The biggest trend of 2024-2026. Built-in AD (automatic differentiation) is becoming essential for inverse design and topology optimization.

GenerationRepresentative SolversAD FrameworkPerformance
1st generation (2019)ceviche, grcwaautograd (CPU)Slow, 2D
2nd generation (2021-23)torcwa, flaport/fdtdPyTorchGPU accelerated, limited 3D
3rd generation (2024-26)fmmax, FDTDX, fdtdz, meentJAXJIT+multi-GPU, large-scale 3D

Conclusion: The JAX ecosystem is emerging as the mainstream for EM solvers. PyTorch-based solvers remain viable, but JAX's JIT compilation + vmap + pmap are better suited for EM simulation.

3.2 GPU Acceleration Status

ApproachSolver ExamplesSpeedup
PyTorch CUDAtorcwa, flaport5-20x
JAX JIT+CUDAfmmax, FDTDX, fdtdz, meent10-100x
Custom CUDA kernelsfdtdz~100x (vs Meep)
Julia CUDA.jlKhronos.jl, RCWA.jl5-10x
Multi-GPUFDTDX, Khronos.jlLinear scaling

3.3 License Distribution

LicenseSolver CountCommercial Use
MIT10 (meent, fmmax, fdtdz, FDTDX, ceviche, rcwa4d, EMpy, flaport, Khronos.jl, Luminescent.jl)✅ Free
BSD-32 (rcwa_tf, EMOPT)✅ Free
GPL/LGPL12+ (torcwa, grcwa, S4, nannos, RCWA.jl, Meep, openEMS, gprMax, torchrdit, etc.)⚠️ Restricted
AGPL1 (inkstone)❌ Very restricted

4. COMPASS Integration Recommendations

SolverTypeRationale
fmmaxRCWAMIT, Meta-backed, best convergence (vector FMM), JAX batching. Potential synergy with meent's JAX backend
FDTDXFDTDMIT, multi-GPU 3D, fully differentiable, JOSS published. Complements fdtdz's 2.5D limitation
SolverTypeRationale
torchrditRCWA (R-DIT)Major speedup by eliminating eigenvalue decomposition. GPL is an obstacle
S4 (phoebe-p fork)RCWAC++ performance reference implementation. Useful for verification. GPL
cevicheFDTD/FDFD2D differentiable EM. For rapid prototyping. MIT

Tier 3: Long-Term Watch

SolverTypeRationale
Khronos.jlFDTDMulti-vendor GPU is attractive but Julia dependency
inkstoneRCWAUnique tensor permittivity support but AGPL
Luminescent.jlFDTDJulia differentiable FDTD. Still early stage

Existing Integrated Solver Assessment

SolverRetention RecommendationNotes
meentActively retainMIT, active, 3 backends, best flexibility
torcwa ⚠️Retain but monitorLGPL, development stalled. Potential replacement by meent
grcwaConsider deprecationGPL, discontinued since 2020. CPU only. Inferior
MeepActively retainGeneral-purpose reference. Lack of GPU support is regrettable
flaport ⚠️Retain but monitorMIT, low activity. Only valuable for education/prototyping
fdtdzRetainMIT, Google-backed, extreme speed. Acknowledge 2.5D limitation

5. Complete Solver Summary Tables

RCWA (20 solvers)

#SolverLanguageGPUADLicenseStatus
1meentPy (NumPy/JAX/PyTorch)MIT112✅ Active
2fmmaxPy/JAXMIT137✅ Active
3torcwaPy/PyTorchLGPL171⚠️
4S4C++/Lua/PyGPL-2.0166⚠️ Fork
5EMpyPyMIT219
6rcwa (edmundsj)PyMIT134⚠️
7grcwaPy/autogradGPL94❌ Discontinued
8inkstonePy/NumPyAGPL63⚠️
9rcwa_tfPy/TFBSD-351⚠️
10RCWA.jlJuliaGPL-3.046
11rcwa4dPyMIT40⚠️
12MESHC++GPL-3.033⚠️
13EMUstackFortran/PyGPL-3.028⚠️
14nannosPyGPL-3.020⚠️
15torchrditPy/PyTorchGPL-3.011
16RETICOLOMATLABFreewareN/A
17PPMLMATLABFreeN/A⚠️
18-203 educational solversPyVarious<10⚠️

FDTD (17 solvers)

#SolverLanguageGPUADLicenseStatus
1MeepC++/Py✅ adjGPL-2.0+1,500✅ Active
2gprMaxPy/CythonCUDAGPL-3.0788
3flaport/fdtdPy/PyTorchMIT650⚠️
4openEMSC++GPL-3.0628
5cevichePyMIT390⚠️
6FDTDXPy/JAX✅ MultiMIT203✅ Very active
7Tidy3DPy☁️ CloudLGPL/Commercial164✅ (Non-open-source)
8fdtd3dC++CUDA/MPIGPL-3.0150
9fdtdzC++/CUDA/JAXMIT146
10EMOPTPy/C✅ adjBSD-3110⚠️
11PhotonTorchPy/PyTorchMIT81⚠️ (Circuit sim)
12Khronos.jlJulia✅ Multi-vendorMIT66
13Luminescent.jlJuliaMIT60
14GSvitC/C++CUDAGPL-2.0N/A
15AngoraC++GPLSmall⚠️
16MaxwellFDTD.jlJuliaN/ASmall⚠️
17REMSRustN/ATiny❌ (1D PoC)

RCWA vs FDTD Solver Comparison

Compare simulated quantum efficiency (QE) curves from RCWA and FDTD solvers. Adjust pixel pitch and solver parameters to see how results and performance change.

RCWA (Fourier order = 9)
0%20%40%60%80%100%400500600700Wavelength (nm)QE (%)RedGreenBlue
FDTD (grid = 20 nm)
0%20%40%60%80%100%400500600700Wavelength (nm)QE (%)RedGreenBlue
RCWA
Time estimate:137 ms
Memory:6 MB
Periodic structures:Yes
Arbitrary geometry:Limited
FDTD
Time estimate:188 ms
Memory:3 MB
Periodic structures:Yes
Arbitrary geometry:Yes
Agreement
Max |Delta QE|:2.2%
Avg |Delta QE|:0.9%
Status:Good agreement

Note: As of 2026, no Rust-based RCWA solver exists. For FDTD, only a 1D PoC (REMS) exists.