Integrated lithium niobate photonics for sub-ångström snapshot spectroscopy - Nature


Integrated lithium niobate photonics for sub-ångström snapshot spectroscopy - Nature

The fabricated lithium niobate modulator comprises two layers with DBRs and an SR-LN layer. The modulator uses the electro-optical properties of lithium niobate (Supplementary Fig. 11g,h) to achieve a reconfigurable modulation capability for spatial and spectral data. Supplementary Fig. 1a shows the electric field distribution in and around the modulator after a voltage is applied. For the internal area of the modulator, the potentials on its upper and lower surfaces are basically uniform (Supplementary Fig. 1b). When the potential difference between the two electrodes reaches 30 V, the potential difference inside the modulator is about 1 V (Supplementary Fig. 11e,f). The electric field has a direction almost perpendicular to the modulator plane and along the direction parallel to the modulator plane (Supplementary Fig. 12c,d). The electric field is inherently non-uniform over the entire chip surface. However, for each individual pixel (with lateral dimensions of only 5.5 μm × 5.5 μm), the local electric field can be considered approximately uniform due to the small size. Moreover, the uniformity of the voltage across the entire chip is not critical for our application. The goal is not to apply the same voltage to all pixels but rather to ensure that the spectral encoding at each pixel varies as the applied voltage changes. The electro-optic response of lithium niobate is highly stable and repeatable. During the calibration process, we measured and stored the spectral response of each pixel under each voltage level (11 levels), effectively accounting for any spatial variations in the electric field or device response.

The SR-LN layer has an average thickness of 500 μm and lateral dimensions of 19.62 mm × 19.62 mm. The surface morphology of the SR-LN exhibits random undulations with a maximum height variation of 15 μm (Supplementary Fig. 13). At the pixel level, the thickness and height distributions follow a two-dimensional Gaussian profile (Supplementary Fig. 14a). The two DBRs, with identical lateral dimensions, were deposited onto the upper and lower surfaces of the lithium niobate chip by chemical vapour deposition. The upper DBR consists of three uniform layers: 40 nm of SiO, 40 nm of SiN and 100 nm of SiO. The lower DBR comprises three uniform layers: 100 nm of SiO, 30 nm of SiN and 30 nm of SiO. The above listed thicknesses correspond to the average target values defined during the fabrication process (Supplementary Fig. 11). These thicknesses were selected to achieve both high spectral resolution and high total optical transmittance (Supplementary Note 6 and Supplementary Fig. 16). A peak transmittance of over 99% for wavelengths of 400-1,000 nm was achieved through the constructive interference of the lithium niobate film under normal incidence illumination. All layers were fabricated with thicknesses controlled to within 10 nm. The slight variations in thickness for different positions further enhanced the overall spatial anisotropy of the film. The thickness of the DBRs and the central SR-LN layer varies at each pixel, resulting in differences in the overall transmittance envelope across a wide spectral range (Supplementary Fig. 14c), with transmission peaks within narrower spectral bands shifted relative to each other (Supplementary Fig. 14d). The correlation between the transmission spectra of each pixel and its neighbouring pixels shows that for an area no smaller than 25 × 25 pixels, the correlation coefficient is below 0.8 (Supplementary Fig. 14b). This demonstrates a high level of spectral encoding independence at each pixel and provides a robust physical foundation for solving linear equations at a spectral resolving power of 12,000 at 400-1,000 nm with spatial pixels of 4,032 × 3,072 (Supplementary Fig. 8).

RAFAEL is a computational spectroscopic imager. Its acquisition process compresses the spectral cube data into a two-dimensional greyscale image using a pixel-wise spectral transmission curve T corresponding to a lithium niobate modulator. The measurement process is given by

where Y(x, y) represents the sensor-acquired greyscale spectral projection with the voltage state s of lithium niobate, X(x, y, λ) denotes the target three-dimensional hyperspectral data, M is the observation matrix of RAFAEL in voltage state s and G accounts for measurement noise. The observation matrix M is formulated as the element-wise product between the effective three-dimensional spectral transmittance matrix T and the CMOS quantum efficiency (QE) curve along the wavelength dimension. The term T here refers to an approximation that transmittance T(x, y, λ, θ) incorporates the impact of multi-angle incidence with the angular distribution θ at sensor pixel (x, y) (Supplementary Fig. 7) and pixel crosstalk caused by optics (Supplementary Table 3 and Supplementary Fig. 4), as detailed in Supplementary Notes 3 and 4.

The computational reconstruction process aims to recover X from Y, which is an ill-posed inverse problem, as the number of equations is typically less than the number of unknowns. Compressive sensing theory indicates that accurate recovery of X is feasible when the column vectors of the observation matrix M are independently and identically distributed (Supplementary Note 2). Moreover, increasing the number of equation constraints enhances reconstruction accuracy. RAFAEL achieves highly independent spectral encoding by leveraging the spatial random fluctuations and cavity interference structures of lithium niobate (Supplementary Fig. 14a,b). Additionally, its reconfigurable nature enables adjustment between frames to the encoding strategy, effectively increasing the number of constraints and improving the fidelity of spectral reconstruction (Supplementary Figs. 9 and 10). To make this high-dimensional inversion tractable, we leverage two main strategies.

Reconfigurable pixel-wise spectral encoding: Under each applied voltage, the lithium niobate film has a distinct, pixel-wise spectral transmission profile, thus effectively generating an independent set of linear equations. For a spatial target occupying B pixels (for example, a 20 × 20 region), this yields B independent equations per frame. By acquiring S different greyscale measurements under S voltage levels (S = 11 in our experiments), we obtain approximately B × S independent equations. Taking Fig. 4 as an example gives approximately 4,400 equations.

Spectral sparsity and smoothness priors: The spectral reconstruction assumes that the underlying signal is compressible in a suitable basis. Natural and astronomical spectra typically exhibit a limited number of sharp features (for example, atomic lines), superimposed on globally smooth continua. This enables accurate recovery using compressive sensing techniques, which require fewer measurements than the ambient dimension (12,000), as long as the signal is sparse or structured in the spectral domain. In our case, approximately 4,000-5,000 independent equations are sufficient for a high-fidelity reconstruction.

To balance the reconfigurability with the snapshot architecture, RAFAEL supports two operational modes: (1) Single-frame reconstruction, where a full spectral cube is recovered from one frame acquired under one of 11 pre-calibrated voltage settings (Supplementary Fig. 9). (2) Multi-frame enhancement, where reconstruction of the current frame is coupled with up to k previous frames (0 ≤ k ≤ 10) using an MAE, effectively leveraging k + 1 distinct encoding patterns (Supplementary Fig. 10). This approach increases the number of constraints and improves the spectral fidelity while preserving the snapshot-per-frame nature (each frame remains independently interpretable and temporally aligned).

Two prototypes were developed to demonstrate the compatibility of RAFAEL with different CMOS sensors and its performance. Specifically, RAFAEL-4M integrates with a MindVision custom CMOS, and RAFAEL-12M uses a LYT800 Mono CMOS. The RAFAEL-4M prototype features a spatial resolution of 2,048 × 2,048 with a pixel size of 5.5 μm × 5.5 μm, and it offers a 12-bit depth pixel array with a read-out noise of 3-5e and an approximate dynamic range of 60 dB. RAFAEL-12M achieves a spatial resolution of 4,032 × 3,072 with a pixel size of 2 μm × 2 μm, and it offers a 12-bit depth pixel array with read-out noise of 0.8-2e and an approximate dynamic range of 85 dB. Notably, the lithium niobate modulator is directly integrated onto the CMOS surface, replacing the conventional SiO protective cover glass (which is typically around 0.5 mm in thickness). Thereby, RAFAEL preserves the overall dimensions of the sensor package compared with the original factory configuration. The spectral bandwidth of both prototypes in a single shot is 400-1,000 nm. Supplementary Table 5 lists the prototypes used in each experiment, along with specific details including spatial pixels, number of spectral channels, field of view and exposure time.

Supplementary Fig. 17 shows the step-by-step fabrication flow. The fabrication can be divided into five steps. Step 1: A lithium niobate crystal was cut and polished by chemical mechanical polishing to form a 500-μm-thick, 19.6 mm × 19.6 mm substrate with inherent surface roughness, referred to as SR-LN. Step 2: DBRs composed of alternating SiO and SiN layers were deposited onto both sides of the lithium niobate substrate using chemical vapour deposition. Step 3: Transparent indium tin oxide electrodes with a thickness of 40 nm were deposited onto both sides of the substrate by chemical vapour deposition. A central 12 mm × 12 mm region was masked during deposition to form an approximately 4-mm-wide rectangular frame-shaped electrode structure that preserves the high optical transmittance in the active area. Step 4: The protective cover glass of a commercial CMOS sensor was removed using a heat gun. Step 5: The lithium niobate modulator was mechanically bonded to the CMOS sensor using a solid-state adhesive, ensuring that the photosensitive region directly interfaced with the lithium niobate without glue intrusion. After fabrication, pixel-wise system calibration was performed to establish the system matrix, enabling direct use of RAFAEL for hyperspectral imaging.

We used a monochromator to calibrate the response of the RAFAEL prototype to a wide spectral range and an extremely narrow full-width at half-maximum (FWHM) light source. As shown in Supplementary Fig. 18a, the light from the monochromator was divided into two parts by a fibre splitter. Some of the incident light was converted into parallel light by a collimated lens and then illuminated the entire CMOS chip. The other part of the incident light was monitored directly by a spectrometer (Ocean Optics HR-4VN400-10). The wavelength range of the monochromator (CME-TLSX300F-3G) was 400-900 nm, the FWHM was 0.5 nm and the central wavelength interval was 10 nm (Supplementary Fig. 18b). The voltage between the ends of the lithium niobate film, which was produced by a d.c. programmable power supply (Korad KA3005P), ranged from 0 V to 30 V. We controlled the monochromator, spectrometer and power supply by computer. The wavelength of the monochromator and the voltage of the power supply were scanned, and the measured grey images in each state (different voltages) were recorded for calibration. In actual imaging of divergent scenes, the observation matrix M at each pixel becomes an angular integral of the original collimated response, as described in equation (1). The matrix M under lens-based configurations preserves the inherent spatial-spectral encoding pattern, albeit with a reduced modulation depth (the contrast between the maximum and minimum transmittance values is slightly reduced due to angular averaging). To compensate for this effect and enhance robustness in practical imaging, RAFAEL can also be calibrated under different lens-based set-ups (Supplementary Notes 7 and 8). For example, in astronomical applications, we used ten known standard stars before formal observations to calibrate the full spectral imaging system, as shown in Supplementary Fig. 29. This process refines the angular-integrated response by adjusting the pre-calibrated M obtained under collimated illumination.

Supplementary Fig. 20a is a schematic diagram of the indoor experimental set-up. We assembled RGB, RAFAEL and commercial spectral imagers with the same 24-mm prime lens. For comparison purposes, we chose RGB, RAFAEL and commercial spectral imagers with the same pixel size (5.5 μm × 5.5 μm). Note that the commercial spectroscopic imager (IMEC Ximea Snapshot Vis) used a 4 × 4 mosaic to achieve 16-channel spectral imaging, which reduced the spatial resolution. The gain for all imagers was set to 3.0, and the exposure time was 50 ms. To make the images captured by the commercial spectral imager brighter, the exposure time was increased to a maximum of 1,000 ms. The targets were placed in the sample holder. The monochromator illuminated the whiteboard of the sample holder from the back through an optical fibre and a collimating lens. The mercury lamp illuminated the whiteboard of the sample holder from the front through an optical fibre. The spectrum of the mercury lamp is shown in Supplementary Fig. 20b. To verify the high spectral resolution and accuracy, a diffuse reflector was placed in the sample holder and illuminated by both the monochromator and the mercury lamp. Given the weak light intensity of the monochromator, transmission lighting was used. Owing to the strong intensity of the mercury lamp, we used reflective lighting. To assess the accuracy of the spectral data acquired by RAFAEL, a high-precision commercial line-scan spectrometer (SPECIM V10E-HR) was used to obtain spectral data as the reference standard. We integrated the spectral dimension of the reconstructed data according to the same sampling interval as the commercial line-scan spectrometer (around 0.9 nm) and calculated the spectral fidelity with the cosine similarity.

In the astronomical spectroscopic imaging experiments, we observed stars in the northern sky from 11:00 p.m. on 29 July to 1:00 a.m. on 30 July 2024 in Lenghu Town, China (latitude 38° 73' N, longitude 93° 33' E). During the observation, the cloud coverage was near 0%, and the Bortle dark sky rating was class 1. The RAFAEL online observation system was constructed using two commercial lenses and a consumer-grade telescope (Celestron Stellar Deluxe DX90EQ). The experimental set-up is illustrated in Supplementary Fig. 29. Using the RAFAEL prototype, we conducted snapshot spectroscopic imaging of stars in the northern sky. We detected stars with apparent magnitudes of up to approximately 10, with 10 being the faintest. The power supply, controlled by a computer, adjusted the voltage across the lithium niobate film, while the RAFAEL prototype captured raw data at each state. The prototype had an exposure time of 700 ms with a gain range of 8 to 16. The programmable d.c. power supply (Korad KA3005P) provided voltages between 0 V and 30 V in 3-V intervals, while a digital oscilloscope (Rigol DHO812) monitored real-time voltage values. After data acquisition, the data were reconstructed in real time using a computational module equipped with an RTX 4090 GPU and an Intel 14900K CPU.

All reconstructions were performed using the proposed PSAIR framework. For single-frame imaging (for example, Figs. 1 and 4 and Extended Data Figs. 1 and 2), the input bypassed the MAE module and the hyperspectral image was directly reconstructed using SPECAT. For continuous dynamic imaging (for example, Fig. 3), the current frame was first processed by the MAE, which incorporated up to ten previous frames along with the current frame, and the output was then reconstructed using SPECAT.

To visually represent multichannel spectral data, we employed a pseudocolour mapping method that converts spectral data into three channels (R, G and B). For natural scenes, we specifically selected three narrow spectral bands (630 nm, 550 nm and 460 nm, each with a 1-nm bandwidth) to generate pseudo-RGB images. This choice was made to show the hyperspectral resolution of RAFAEL and its ability to distinguish between materials that standard RGB cameras cannot. Unlike RGB imaging, which compresses spectral information, these narrow bands retain finer spectral details, demonstrating the superior material discrimination capabilities of RAFAEL.

For stellar spectra, which exhibit distinct absorption features, we used a colour display method analogous to the Harvard classification. Additionally, to enhance the colour richness, we designed a specific pseudocolour projection matrix. The intensity of characteristic lines from hydrogen, helium and iron atoms served as the basis for classification, with the mapping equations provided below.

I, I and I are the corresponding absorption peak intensities, which can be obtained from

where N denotes the number of characteristic lines selected for each element, α denotes the weight of the ith absorption line, is the intensity at its absorption peak, and is the mean intensity over the line's wavelength interval. For hydrogen atoms, we selected two characteristic absorption lines: H at 6,565 Å and H at 4,861 Å. For helium atoms, the chosen characteristic lines were He i at 5,875 Å and He ii at 4,686 Å. For iron atoms, we selected three characteristic lines associated with Fe i, specifically at 8,501 Å, 8,668 Å and 8,831 Å. A spectral range of 40 Å was used to calculate the average intensity around each of these characteristic lines. Moreover, to simultaneously reflect the relative brightness differences of celestial bodies with different apparent magnitudes, we used the following mapping to adjust the colour brightness C of the pseudocolours C(R, G, B) of all celestial bodies:

where I is the absolute intensity of the celestial body, which is equal to the integrated value of the reconstructed spectrum along the wavelength direction. α is the enhancement coefficient, which ranged from 0.5 to 2 based on the average apparent magnitude of all celestial bodies in different regions of the starry sky. C(R, G, B) is the final displayed colour of the celestial body in the pseudocolour image. We classified the stars based on the intensity of three absorption lines, and then presented the results by analogy with the Harvard spectral classification system, dividing them into O, B, A, F, G, K and M types. As the intensity of these three absorption lines is closely related to the stage of the nuclear reactions inside the stars, we could then infer the surface temperatures of the stars.

Our MAE-decoder shares the same principles and overall framework as the MAE in self-supervised learning. The difference is that in this study, it is applied in the spatio-temporal dimension (compressed grey images), using the interframe correlation to recover the measurement values of the same pixel at different times (in different reconfigurable states) Φ(x, y, t). For measurement values ϕ(x, y, t) at several time frames, the encoded measurement values ϕ(x, y, t) are given by

where m(t) is a random mask generated in the time domain based on the number of time frames and CL is the fully connected convolutional layer implemented through deep separation convolution with a kernel of 3 × 3 (Supplementary Fig. 21b). As shown in Supplementary Fig. 21a, the input is the masked measured time-series images. Through pre-supervised training, the MAE-decoder can recover the measurement values of the system in different reconfigurable states and complete the interframe alignment (Supplementary Fig. 21c).

The PSAM uses both the low-resolution data with broad wavelength intervals (approximately 10 nm) acquired for all pixels and the actual film structure of RAFAEL as inputs. Initially, a forward transmission model for each pixel is established using the transfer matrix method, governed by

where T(λ,n,d) represents the transmittance of wavelength λ through a film of refractive index n and thickness d, n and n are the refractive indices of the outgoing and incident media, respectively, and A and A denote the amplitudes of the light. I(λ) and I(λ) are the transmitted intensity and incident intensity as functions of wavelength, respectively. This model was derived using a differential approach by idealizing the lateral dimensions of the finite elements. Pixel crosstalk is treated as a noise term in the equivalent model (Supplementary Note 3). These amplitudes can be obtained by solving the system of equations under the constraints of Maxwell's equations:

where H is the transfer matrix for light propagating between adjacent layers, [AB] represents the amplitudes of the transmitted and reflected light, and D corresponds to the boundary conditions for light propagation.

For the lithium niobate film with an SRSM structure, apart from the lithium niobate layer in the middle, the thicknesses and refractive indices of the multi-layer silicon-based films on the top and bottom surfaces are known. Consequently, the initial estimate of d is used in the forward transmission model to obtain the transmitted light intensity I(λ). Subsequently, the actual calibrated I(λ) is compared with I(λ) using an evolutionary algorithm to iteratively optimize the thickness d of the middle layer.

The evolutionary algorithm was configured over 100 iterations with a population size of 10 individuals per generation, a mutation amplitude of 1 nm and a fitness function set as the reciprocal of the mean squared error between I(λ) and I(λ). The optimized thickness d obtained after the evolutionary algorithm iterations was deemed to be the thickness of the middle layer of lithium niobate film corresponding to a single sensor pixel. By reusing the forward propagation model, high-spectral-resolution calibration data can be obtained for each pixel. Iterating this process over all pixels yielded the equivalent mask.

The SPECAT used in this study is described in our previous work. It was designed for high-resolution hyperspectral image reconstruction using compressed spatial-spectral measurements and a mask as inputs. It uses a cumulative-attention block (CAB) within an efficient hierarchical framework to extract features from compressed spatial-spectral details and integrates the optical path constraint to achieve the final step of spectral reconstruction. For the measurement at every moment Y(x, y), it obtains the corresponding spectral image X(x, y, λ) through the reconstruction process:

Here M is the observation matrix of the system and τψ(X) represents the regularization terms using the characteristics of hyperspectral images. SPECAT uses the U-Net structure to connect CABs of different scales. The input reconstructed data are processed through two layers of CABs and then downsampled (4 × 4 convolution) to reduce the spatial dimension to one fourth of the original size. Subsequently, the data are passed through a single-layer CAB. During upsampling, SPECAT connects with shallow features by skipping connections, and then it outputs the reconstructed three-dimensional hyperspectral data through another set of two CABs.

The reconstruction framework supports both multi-frame and single-frame inputs (Supplementary Note 5). When several frames are available, the MAE performs spatio-temporal-spectral coupling by compressing the current and previous k spatial-spectral greyscale frames (k ≥ 0). In the single-frame case (k = 0), no temporal reference is available, so the MAE and decoder operate solely on that frame. The output corresponds to a masked reconstruction of the original frame, without temporal coupling.

For training, we adopted three publicly available experimental datasets: the ICVL dataset, the ARAD-1K dataset and the spectroscopic survey dataset in the 17th Data Release of the Sloan Digital Sky Survey. All of these are based on experimental data. We used a single pretrained model trained on the above datasets for all experiments.

In the hyperspectral image reconstruction for the indoor experiments, the overall loss is 0.5 × root mean squared error (RMSE) for spectral images + 0.5 × RMSE for compressive measurement. RMSE is a common objective function for the hyperspectral reconstruction of spectral images, whereas RMSE guides the optimization of the reversible light path for compressive measurements (which makes the reconstructed values match the actual measurement values after measuring the optical path through the system). For the astronomical experiments, the loss was 0.001 × L1-norm for the output + 0.999 × RMSE for spectral images in the training process for the generator. Such a loss function evaluates the consistency between the overall trend and the accurate value of the output celestial spectrum, while retaining sharp characteristic peaks. It guides the reconstruction process in determining the characteristics of the celestial spectrum.

Previous articleNext article

POPULAR CATEGORY

corporate

13991

entertainment

17041

research

8473

misc

16502

wellness

13944

athletics

17927