1. Introduction
The recent theory of compressive sensing (CS) provides a fundamentally new data acquisition approach. CS theory establishes that a signal, e.g. a sound record or an astronomical image, can be sampled at a much smaller rate than what is commonly prescribed by Shannon - Nyquist 1)(3. This theory has been widely used in different areas as spectral imaging, light field imaging, video acquisition, tomography, among others 4)(7 , assuming that the signal under analysis can be sparse in some representation basis. Formally, suppose that a given signal , is S-sparse on some basis
, such that
can be approximated by a linear combination of
S
vectors from
, with
. The acquired measurements can be expressed as
, where is known as the sensing matrix. Specifically, in compressive light field, a four dimensional image
, or its vector form
, is captured in the detector just taking 2 - dimensional (2D) coded projections of the underlying scene (1). H is the measurement matrix, whose structure is determined by the coded aperture entries of the acquisition system and α is a sparse vector that represents the signal
in the domain
. The inverse problem requires to solve the minimization given by
being
the reconstructed light field from
is a regularization parameter and
norm operators, respectively.
A light field is a specific parameterization of the plenoptic function. The plenoptic function was introduced as a complete holographic representation of the visual world. This function describes the observable light intensity at every view position and every instant of time . Adding spectral information intensity in any possible angle
results in a 7D function as
(8), where
is the plenoptic function,
are the spatial dimensions,
represent the angular information,
represents the wavelength and
denotes the time 8. Plenoptic imaging considers a subset of computational photography approaches; specifically, those that aim at acquiring the dimensions of the plenoptic function with combined optical light modulation and computational reconstruction as CS theory 9.
This work studies the plenoptic function in five dimensions (5D), assuming that the air is transparent in a ray space representation and the spectral information is included. The plenoptic function is reduced from 7D to 5D, ignoring the time t and the depth . The specific parameterization of the full plenoptic function can be seen as a light field in multiple wavelengths. This is, every angular 2D image has a manifold of spectrum bands, and this is called Multispectral Light Field. The knowledge of the spectrum at various points in the image can be useful in identifying the composition and structure of objects in the scene being observed by the imager. Moreover, spectral images are useful in many areas such as remote sensing, astrophysics, biochemistry, and many others (8). Five dimensional image acquisition and reconstruction from sparse measurements is possible due to the CS domain (5)(6)(10). The 5D function that describes a multispectral light field can be written as Eq. (1)
In general, this paper describes the multispectral light field acquisition using colored coded apertures. More specifically, an algorithm to simulate the multispectral light field sampling and the robust reconstruction from 2D projections was developed. Sampling is performed by means of a colored coded aperture that modulates not only the spatio - angular dimensions but the spectral information as well. The recovery of the underlying multidimensional signal is performed using the LASSO reconstruction algorithm.
The main contribution of this paper is the development of the mathematical model for multispectral light field acquisition. This paper is organized as follows: the proposed multispectral light field sampling model is developed in section 2, as well as, the multidimensional data cube reconstruction problem. Simulations and results with the proposed acquisition model are detailed in section 3 and finally, some conclusions are presented.
2. Compressive multispectral light field model
This section describes the mathematical model for multispectral light field coded projections acquisition, the development of the spectral overcomplete dictionary as a representation basis, the reconstruction method and a brief description of the colored coded apertures used in this work.
2.1. Multispectral light field acquisition model
The CS - based acquisition system is composed by three optical elements: the focal plane array (FPA) or detector, the main lens, and the most important element: the coded aperture. Traditionally, in these compressive light field systems, the light rays are modulated by a binary coded aperture which modulates the spatial and angular features of the scene, blocking and unblocking the light from the scene toward the sensor (5)(11). In the proposed acquisition system, the binary coded aperture is replaced by a colored coded aperture which modulates both the spatio - angular dimension and the spectral dimension. Figure 1 illustrates a sketch of the proposed acquisition system.

Figure 1 Compressive multispectral light field acquisition system. A colored coded aperture is placed at a distance 𝑑𝑙 from the image sensor. The light rays are modulated by the colored coded aperture before they are captured in the detector. The spatio- angular frame reference is illustrated in the bottom-right; each spatial point of the scene can be seen from different angles assuming a two-plane parameterization
The set of light rays in a free space is denoted as where
and
indicate the spatial coordinates,
the angular dimensions and
the spectral dimension (wavelength). The function
is the 5D plenoptic function described in Eq. (1). This set of rays is coded by the colored coded aperture defined by the transmittance as
Given that the operations along
coordinates are equivalent to those along
, a simplification in the spatial and angular dimensions,
can be considered and, the continuous measurement
on the detector can be expressed by Eq. (2)
In the optical path, the light ray set is once again parameterized by the colored coded aperture , producing a defocusing in the trajectory of the light rays. The diagram showed in Figure 2 illustrates this parameterization for a single light ray.

Figure 2 Illustration of the multispectral light field modulation through the colored coded aperture T(x, λ). The colored coded aperture modulates the multispectral light field before it is captured in the detector. The parameterization of the light rays is showed using just one light ray for explanation purposes (red line)
By similar triangles, the light ray that intersects the 𝑥 plane (detector plane) and the 𝑇 plane, also intersects the plane (lens plane) at
Given that the parameterization is conducted by the coded aperture plane, Eq. (2) can be rewritten as in Eq. (3)
This parameterization shows that multispectral light field imaging can be thought of as shearing the 5D multispectral light field, and then projecting it down to 2D 12.
The discretization of the colored coded aperture is defined through its transmittance function as Eq. (4),
Where is a binary value accounting for the block and unblock operation at the
voxel of the 5D coding,
and
denote the pixel size of the colored coded aperture and the sensor, respectively.
Using Eq. (4), the discrete detector measurement can be expressed as Eq. (5)
Where is the intensity at the
position of the detector (considering just one dimension of the 2D detector),
is a spatio - spectro - angular data cube and,
is the noise of the capturing process. The multispectral light field coded projection at the detector
can be also written in matrix form as Eq. (6)
Where
is the vectorized multispectral light field,
is the corresponding measurement, and
is the sensing matrix for the whole multispectral light field. Each submatrix
in Eq. (6) is a sparse matrix that contains the structure of the colored coded aperture
on its diagonal, for each angular view 𝑗 and for each spectral band
. Figure 3 shows an example of the
matrix structure for
and
pixels in the spatial dimension,
and
angular views and
spectral bands. The top half of the 𝐇 matrix in Figure 3 corresponds to the first angular view and the bottom half corresponds to the second angular view.
The H matrix, mathematically, represents the spatio - spectro - angular modulation element in the acquisition system.
The sampling and reconstruction processes for the multispectral light field are developed using small patches of the image instead of the full image. This approach allows to simultaneously solve many small and independent problems, instead of attempting to solve a single and large optimization problem (5). The notion of patches will be exposed in following sections. Algorithm 1 in Figure 4, developed in this work, shows the sampling process of a multispectral light field.
The input parameters of the sampling algorithm 1 are: the matrix, the multispectral light field
, and the patch sizes. For each iteration, a set of patches is sampled using the Eq. (6) and each measurement is stacked in the measurement vector
.
2.2. Reconstruction of Multispectral Light Field Projections
The inverse problem to recover the multispectral light field f from the measurements g requires to invert the linear system presented in Eq. (6). Since the number of measurements g is significantly smaller than the number of columns of H, i.e. , it is necessary to use sparse coding techniques to solve the underdetermined problem.
The main idea of sparse coding techniques is that underdetermined problems can be solved if the unknowns are sparse or can be represented in a sparse basis or overcomplete dictionary. Assuming that the signal f is sparse on some overcomplete dictionary
, f can be approximated by a linear combination of S columns of the dictionary
as
, where most of the coefficients in
have values close to zero. Thus, the coded projections in Eq. (6) can be rewritten as
. Then, the reconstruction consists on recovering
such that the
cost function is minimized as Eq. (7)
Being a regularization parameter and
the reconstructed multispectral light field from g.
Algorithm 2 (Figure 5) shows the reconstruction process where M - long vectors , from the measurements vector g are proccessed. For each subproblem, the optimization in Eq. (7) is solved. The main algorithm inputs are the measurements vector
, the measurement matrix H and the overcomplete dictionary
. The output of the reconstruction algorithm is the reconstructed multispectral light field f. Notice that the reconstructed multispectral patches are stacked in the vector f, and it is necessary to rearrange the vector to obtain the 5D multispectral image.
The quantification of the reduction in the data representation size is the compression ratio and is given by Eq. (8)
Where is the compressed size, in this case, it is the size of the spatial dimension (detector size) and
, the uncompressed size, is the size of the full data (multispectral light field size). The compression ratio 𝑟 can be used as a comparison measure.
2.3. Learning an Overcomplete Dictionary
Previous works in compressive light field have demonstrated high compression and a good data representation using an overcomplete dictionary as a representation basis. An overcomplete dictionary is composed of a set of fundamental building blocks of natural light fields or light field atoms (5). A light field atom is a 4D spatio - angular patch, or a set of pixels extracted from a training light field.
In order to obtain a good representation of the 5D multispectral light field in the dictionary domain, the 4D light field atoms presented in (5) are modified adding the spectral information. Let be a set of
training signals or 5D patches, where the size of each 5D patch is
. Adding these signals as the columns of a matrix
, the general dictionary learning problem is to find a sparsifying dictionary
by solving Eq. (9)
Where is a training set that contains q signals,
is the overcomplete dictionary,
is a set of
sparse coefficient vectors, and
is the sparsity level or the number of nonzero entries. The non - convex
pseudo - norm,
, counts the amount of nonzero elements in a vector. In general, the minimization in Eq. (9) looks for the best possible dictionary for the sparse representation of the set C(5)(13). Algorithm 3 shows the process to train a dictionary (Figure 6).
The optimization problem of learning a dictionary can be seen as a search for sparse representations with coefficients summarized in the matrix C The dictionary is iteratively updated together with the sparse coefficients. Assuming that both C and D are fixed, each dictionary column and the corresponding
row in C are denoted as
. For each column of D, a group of examples
are defined using the atom
. Then, the overall representation error matrix is calculated in
. The value
is built choosing only the columns corresponding to
in
, implying a selection of error columns that correspond to examples that use the atom
. Applying SVD (Singular Value Decomposition) in
such as
, the dictionary column
is updated choosing the first column of the matrix
, and the coefficient vector
is updated choosing the first column of
multiplied by
. The best dictionary is obtained to represent the 5D patch signal as sparse composition(13).
2.4. Colored Cod7ed Apertures
The colored coded apertures are a recent advance that allows to modulate the wavelengths of interest. These apertures have been used in applications such as artificial compound eyes (14), multi-focusing and depth estimation (15), deblurring and matting (16) and compressive spectral imaging (17).
A colored coded aperture can be seen as an array of binary coded apertures where each voxel blocks or unblocks a specific wavelength set of the spectral data cube pixel (Figure 7).

Figure 7 Illustration of a colored coded aperture and its equivalent binary coded aperture equivalence. Each colored coded aperture voxel can operate as a low - pass or high - pass
optical filter
The colored coded aperture pixels can modulate the spectral data as low - pass or high - pass
optical filters. Each colored coded aperture pixel is one of many possible optical filters whose spectral response can be selected. For instance, a low pass filter
(Figure 7 bottom) permits to pass the wavelengths
,
and
in the spectral data cube pixel. A high pass filter
permits to pass the wavelengths
and
. The use of this optical element has demonstrated a high probability of correct reconstruction (high PSNR) in previous researches (17)(18.
3. Results
In order to test the proposed model, a set of compressive measurements is simulated using the forward model in Eq. (6) and the algorithm 1. These measurements are constructed using a synthetic multispectral light field data set. The multispectral light field was built employing a light field data set published by the MIT Media Lab 5. The original light field has 593×840 pixels of spatial resolution, 3×3 angular views and L=3 bands in the visible spectrum (RGB). This light field was modified adding two new spectral bands to each angular image. These matrices are located between the existing bands and a cubic interpolation (Cubic spline) defines the value of the new pixels. Thus, the resultant synthetic dataset is a multispectral light field with 5 spectral bands, 593×840 pixels of spatial resolution, and 3×3 angular views (Figure 8(a)). The spectral bands range between 450nm and 650nm.

Figure 8 Synthetic data set, dragon’s scene. (a) Multispectral light field with spatial (x,y), angular (θ, ) and spectral information
. The spatial dimension of the data set is reduced for each angular view in each spectral band. (b) The reduced version of the data set is used to test the proposed algorithm, the size of the reduced data set is 405×360 pixels, 3×3 angular views and 5 spectral bands
A reduced version with 405 x 360pixels, 3 x 3 angular views and 5 spectral bands of the data set is used to test the proposed algorithm (Figure 8(b)). The underlying signal is reconstructed solving the optimization problem described in Eq. (7), using the LASSO algorithm 19 briefly summarized in Algorithm 2. The reconstruction quality was quantified with the Peak signal-to-noise ratio (PSNR), which is measured in decibels (dB). For the simulations, a patch size was used i.e. 9 x 9 pixels in spatial resolution, 3 x 3 angular views and 5 spectral bands.
The learned dictionary was created choosing 5D random patches from four training multispectral light field data sets show in Figure 9. The patch size was, i.e. 9 x 9 pixels of spatial resolution, 3 x 3 angular views and spectral bands. Algorithm 3 was employed to learn the overcomplete dictionary 13. All simulations were conducted using a dual Intel Xeon E5-2670v2, 2.5 GHz processor and 64 GB RAM.

Figure 9 Synthetic training light fields mapped to RGB. Available online at the MIT Media Lab webpage
3.1. Reconstruction from the measurements vector
The notion of using patches helps in speeding up the reconstruction process. The 5D overlapped patches are reconstructed separately. A sliding window was used to merge the reconstructed overlapping patches with a median filter. The reconstructions are shown in Figures 10 and 11, and the PSNR achieved is exhibited for each reconstructed slide.

Figure 10 (Top) The original and (Bottom) reconstructed multispectral light fields for the first angular view and its five spectral bands. The PSNR is presented for each reconstructed spectral band

Figure 11 Reconstruction of the third spectral band (L=550 nm). All the angular views are depicted. The range of achieved PSNR is between 24 dB and 25 dB for the different views
Figure 10 (top) shows the first angular view of the multispectral light field ground truth, mapped to a RGB profile and its five spectral bands between 450nm and 650nm. The bottom row shows the original RGB profile and its five reconstructed spectral bands. Figure 11 shows the reconstruction of the third spectral band for all angular views. The 5D multispectral light field was reconstructed from just 2D coded projections. It can be noticed that the reconstructions exhibit a good spatial and angular approximation of the synthetic scene. Figure 12 presents the spectral signatures of the original image compared to the reconstructed signatures for three points of a specific angular view. The resultant curves are close to the originals showing a good estimation of the multispectral light field information.

Figure 12 Spectral signature comparison for three spatial locations indicated by the points P1, P2 and P3 for the first angular view (a) and the last angular view (b)
The achieved compression ratio in the reconstruction of the multispectral light field is calculated in Eq. (10) using Eq. (8)
The percentage of achieved compression is 97.8%. In other words, the multispectral light field reconstruction obtained with the proposed model requires only 2.2% of the whole multidimensional image compared with a traditional acquisition model that requires 100% of the information. Figure 13 shows an overview of the reconstruction quality for the dragon’s scene. The achieved PSNR is between 22 dB and 25 dB.

Figure 13 Reconstruction quality (PSNR) achieved for each angular view of each spectral band measured in dB
The Structural SIMilarity (SSIM) index is calculated in order to estimate the image quality in terms of: luminance, contrast and structure20. SSIM provides a better consistency with the perceived quality measurement. The SSIM range is between -1 and 1, where 1 is the identical image. Table 1 shows the SSIM calculated for the third spectral band and a comparison with the calculated PSNR. Similar results were obtained for the remaining 4 spectral bands.
4. Conclusions
A mathematical model of the sampling and reconstruction multispectral light field based on compressive sensing with colored coded apertures has been developed. The proposed model includes the spectral information in a light field without increasing the measurements size captured in the detector. Simulations show the reconstruction of a multispectral light field using just the 2.2% of information of the scene. The full spatio - spectro - angular data cube is reconstructed exhibiting PSNR values between 22 dB and 25 dB. In a future work, this model could be extended to the optimization of the colored coded aperture pattern in order to improve the reconstruction quality.
5. Acknowledgments
The authors gratefully acknowledge the Vicerrectoría de Investigación y Extensión of Universidad Industrial de Santander for supporting this work registered under the project titled “Implementación de matrices de filtros ópticos con patrones aleatorios para la adquisición de imágenes hiperespectrales” with VIE code 1363