I Introduction
Hyperspectral imaging techniques have been widely applied in various fields such as astronomy [1], remote sensing [2], and biomedical imaging [3]. Unlike ordinary imaging systems, hyperspectral imagers capture a threedimensional (3D) datacube, i.e.
, a 2D array of vectors that contain the spectral information at each spatial location. Inspired by compressive sensing (CS)
[4, 5], researchers have adopted joint sensing and reconstruction paradigms that measure a subset of the 3D spectral datacube [6, 7, 8, 9]and utilize CS inversion algorithms to retrieve a 3D estimate of the underlying hyperspectral images. This sensing paradigm follows the traditional benefits of CS  reduced data rates and system complexity at the expense of computational algorithmic development and postprocessing.
The coded aperture snapshot spectral imaging (CASSI) [7, 8] systems are examples of CS hyperspectral imaging systems. The CASSI paradigm encodes each of the datacube’s spectral channels with a unique 2D pattern, which is the underlying operating principle behind code division multiple access (CDMA). CASSI systems form an image onto a coded aperture placed at an intermediate image plane to spatially modulate the datacube with highfrequency patterns (Fig. 1). A disperser placed in a pupil plane behind the coded aperture spectrally shifts the coded image, effectively granting each spectral channel onto its own unique coding pattern when multiplexed onto a monochrome sensor. Unlike other hyperspectral imaging techniques [10, 11, 12], CASSI can obtain a discrete 3D estimate of the target spectral datacube from as little as a single 2D measurement. Such operation renders the system’s forward model highly underdetermined; inversion requires use of CS algorithms [13, 14, 15, 16]. Compressive hyperspectral imaging requires the signal under estimation to be sparse in a basis that is incoherent with the system sensing matrix [7]. The reconstruction is accomplished using optimization algorithms, such as gradient projection for sparse reconstruction (GPSR) [7] or twostep iterative shrinkage/thresholding (TwIST) [17]. GPSR assumes sparsity of the entire datacube in a fixed (wavelet) basis, while TwIST is based on a piecewiseconstant spatial intensity model (when the total variation norm is used) for hyperspectral images.
Distinct from the above optimization algorithms, blind CS algorithms [18] have been applied to CASSI systems by learning dictionaries from the measurements. Blind CS inversion strategies seek to recover 3D patches of the hyperspectral datacube jointly with the shared dictionary (inferred from the measurements). Each patch is a sparse combination of the dictionary atoms. Since the dictionary is unknown a priori, this is called blind CS [19]. This blind CS model shares statistical strengths among similar image patches at different spatial locations. Additionally, this CS approach is taskdriven since the learned dictionary is appropriate for different tasks [20]. In this paper, we propose a new blind CS model that imposes compressibility [21], rather than sparsity, on the recovered dictionary coefficients. Specifically, we use globallocal shrinkage priors [22, 23, 24] on the dictionary coefficients for each patch, under a Bayesian dictionary learning framework. This setting drives the reconstructed dictionary coefficient vector to contain many small (i.e., close to zero) components while allowing for a few large components, but avoiding explicit sparsity enforcement. This model is feasible for highdimensional hyperspectral signals since it can extract more information from the limited (in situ learned) dictionary atoms.
The quality of the reconstructed hyperspectral images relies on the conditioning of the sensing matrix. More accurate recovery is possible with additional measurements of the same scene [17] or by using digital micromirror (DMD) arrays [25]; however, these methods increase system cost and energy consumption. This paper features a blind CS algorithm that employs the RGB image of the target scene (as side information) and demonstrates substantial improvements on reconstruction quality with a single CASSI measurement. Particularly, the proposed algorithm learns a joint dictionary from a single CASSI measurement and the corresponding RGB image (of the same scene) and then reconstructs the hyperspectral datacube. Since the RGB image is strongly correlated with the hyperspectral images, this joint model will dramatically aid the dictionary learning algorithm, thereby improving the reconstruction.
Furthermore, we propose a new camera using spatiallight modulation (SLM) [26] for an active coding compressive spectral imager, to jointly modulate the spatial and spectral information of the scene on the intermediate image plane. This technology differs from CASSI and DMDmodulated CS imagers in that no dispersive element is required for multiplexing the spectral channels.
The reminder of the paper is organized as follows. Section II reviews the mathematical model of CASSI and introduces the new camera. Section III develops the new blind CS algorithm to reconstruct hyperspectral images. Experimental results are presented in Section IV, and Section V summarizes the paper.
Ii Hardware
In this section, we first review the CASSI imaging process and then introduce the new camera. Finally, a shared mathematical model for both cameras is proposed.
Iia Review of CASSI
A common CASSI design [7, 8, 17] in Fig. 1 adapts the available hardware to encode and multiplex the 3D spatiospectral information onto a 2D detector. This modulation process is based on physically shifting the hyperspectral datacube multiplied by the binary coding pattern via chromatic dispersion. The coding pattern consists of an array of square, binary (100% or 0% transmission) apertures, which provide full or zero photon transmission. The camera’s disperser (in traditional hyperspectral CS [6, 27], this is a prism or diffraction grating) laterally displaces the image as a function of wavelength of the coded aperture pattern, where is the lateral wavelengthdependent shift and represents the disperser’s central wavelength. As previously mentioned, this coding process can be considered a form of CDMA, whereby each channel is modulated by an independent coding pattern on the detector plane. The detector integrates the spectrallyshifted planes along the spectral axis. Information can be recovered by separating channels based on their projected coding patterns. This multiplexing process can be represented as the following mathematical forward model:
(1) 
where is the continuous form of the detector measurement, representing the sum of dispersed coded images. Since the detector array has been spatially pixelated by the detector pixel pitch , the discreatized detector measurement becomes:
(2) 
Finally, the discrete measurement for each pixel can be illustrated as:
(3) 
where is the spatially encoding pattern, represents the discretized spectral density of , and is the noise.
IiB A New Camera: SLMCASSI
Here we use an SLM to encode the 3D spatialspectral information on a 2D grayscale detector  a similar CDMA strategy to CASSI for snapshot hyperspectral imaging. An SLM is an array of micro cells placed on a reflective layer; each cell has a nematic liquid crystal that adds pixellevel phase to the incident light as a function of voltage, thereby changing the polarization state on a perpixel basis [26]. Each layer of the LC can be considered as a thin optical birefringence material; the orientation and the relative refractive index difference between the fast and slow axes determines the effective birefringence. Since most birefringent phase modulators are nominally sensitive to wavelength, this element can assign wavelengthdependent transmission patterns to multiplex every spectral channels for the compressive measurement. The hyperspectral slices can be separated from the coded data via CS inversion algorithms.
IiB1 Experimental Setup
A schematic of this SLMbased CASSI system is shown in Fig. 2. An objective lens (L1) images the scene from a remote distance, and then the collimation lens (L2) and imaging lens (L3) relay the image through the polarizing beamsplitter and the achromatic quarterwave retarder onto the SLM. The retarder serves to increase the contrast of the SLM polarization array by compensating the extra phase retardation in the liquid crystal device. The spatiospectral coding function is implemented on the SLM as an 8bit, pseudo random, pixellevel grayscale phase retardation pattern. The phase retardation manifests as wavelengthdependent ampitude modulation upon reentry through the polarizing beamsplitter toward the camera. Finally, the modulated image is projected by the imaging lens (L4) onto the detector plane and then recorded by the detector. The following mathematical forward model can be used to describe the multiplexing process of the compressive sampling:
(4) 
where are the wavelength dependent transmission patterns provided by the SLM, which can be calibrated by analyzing its electricallycontrolled birefringence. Since the detector array is spatially pixellated, the detector measurement is given by:
(5) 
where represents the discretized transmission patterns. Fig. 3 is a photograph of the experimental setup. The setup includes a 60 mm objective lens (Jenoptik), a 75 mm achromatic relay lens (Edmound Optics), a polarization beam splitter (Newport), an achromatic quarter wave plate (Newport), a liquid crystal based SLM (Pluto, Holoeye), two 75 mm imaging lenses (Pentax), and a monochrome CCD camera (Pike F421, AVT) with 20482048 pixels that are 7.4 m square. A 450680nm band pass filter (Badder) is mounted on the objective lens to block unwanted ultraviolet and infrared light (the SLM and camera are optimized for visiblerange operation). The SLM provides strong modulation and effective multiplexing to fulfill the requirement of compressive sensing. This modulator is based on a reflective Liquid Crystal on Silicon (LCoS) microdisplay technique[26], which has a 19201080pixel active area with an 8 m pixel pitch.
IiB2 System Calibration
The effect of the 8bit, pseudorandom voltage pattern on the spectral channels can be estimated by recording the system response generated by the SLM. Fig. 4 records the transmission under the monochromatic illumination and homogeneouslyapplied voltage on the modulator. We average the transmission across the center area of the SLM and calculate its relative transmisson. A potential advantage of this SLM is that we can change the code easily to get different modulations.^{1}^{1}1This also opens the door of adaptive compressive sensing [28, 29, 30]. However, after plenty experiments, we found the contribution of adaptive sensing is very limited in the CS inversion, partially due to the mismatch between the designed code and the calibration. We further found random coding performing well in our camera.
We assume that the system operator provides onetoone mapping (1:1 magnification of the SLM pixels onto the detector) between the microdisplay and the detector array. Theoretically, the system operator can be estimated by using SLM’s calibration data and the applied voltage on the SLM. However, one might account the error in the real system projection. For example, optical aberrations and subpixel alignment discrepancies between the detector and SLM might break the ideal mapping and image some of the SLM pixels onto several detector pixels. Therefore, part of the transmission code might deviate from the ideal value, which results in an inaccurate system operator. Since the system operator dominates the quality of the object estimation in this inverse problem, a careful calibration of the is required.
A better representation of the system response can be acquired by recording the transmission pattern illuminated by each spectral band. The revised system operator includes all the possible coding patterns generated by the SLM, which can contribute to the data reconstruction. Here we combine the tungsten light source (LSHT250, Horiba) with a monochromator (iHR320, Horiba) to quantize the spectral dimension into bands of finite width. Each band has a 7.58 nm full width at half maximum (FWHM). The scene’s spectral irradiance is recorded by a fiber optics spectrometer (USB2000, Ocean Optics); these values are taken as ground truth in the experiments presented later in the paper. The number of spectral channels and their central wavelengths are determined by the grating period (1800 periods/mm) and the optical path length of the monochromator. To better represent the continuous light source, two adjacent spectral bands are separated by the monochromator’s FWHM. At this calibration resolution, the system’s spectrallysensitive bandwidth (450680nm) has been discretized into spectral channels. Importantly, the spectral resolution of the reconstructed data is determined by the monochromator’s resolving power during calibration; smaller FWHM values result in larger numbers of calibrated and reconstructed spectral channels. Compared to the spectral imagers that are reliant upon dispersive elements to spatially shear the physical code (in CASSI), this SLM based spectral imager can easily improve the spectral resolution without revising the main camera’s optical design.
IiC Shared Mathematical Model of CASSI and SLMCASSI
Assuming we are measuring a hyperspectral datacube , where signifies the number of spectral channels of spatial extent . We can interpret both CASSI and SLMCASSI measurements as a summation of the 3D Hadamard product of the datacube with the different modulation codes, . Here, the is a shifted version of the same code as in CASSI [18]; however, each spectral modulation pattern of depends on the SLM response in SLMCASSI. For each pixel, both (3) and (5) can be represented as:
(6) 
The pixelwise modulation of CASSI results in feasible patchbased reconstruction algorithms.
Since the SLMCASSI sensing paradigm reconstructs spectral images from a single measurement , the compression ratio of the CASSI and SLMCASSI systems discussed above is . The compressive sampling process in both spectral imaging systems can be represented by the same matrix described in the following section.
Iii Reconstruct Hyperspectral Images with Blind Compressive Sensing
In this section, we develop a new blind Bayesian CS model to reconstruct hyperspectral images from CASSI or SLMCASSI measurements. The proposed model is a generalized framework which can also be used in denoising and inpainting problems, among others [31].
Iiia Dictionary Learning Model
We decompose the 3D hyperspectral datacube into small patches with size . In vector form, these patches are denoted , with . We seek a dictionary with atoms such that , based on the captured image . For each patch, we have the corresponding 2D measurement patch and the 3D mask (coding) patch . Considering the vectorized format of the measurement for patch , and the corresponding mask patches (each is the vector form of 2D slice in ), we have the measurement model
(7)  
(8)  
(9) 
where represents the additive Gaussian noise, and we model it as , with denoting the noise precision.^{2}^{2}2The extension of our model to nonuniform denoising problem [32] is straightforward, i.e., by imposing spatiallyvarying noise models for different patches. We place a diffuse gamma prior on
(10) 
where
are hyperparameters. Taking account of the dictionary learning model, (
7) can be written as:(11) 
where is a vector of coefficients describing the decomposition of the signal in terms of dictionary atoms. Given the measurement set and the forward matrices , we aim to jointly learn , and the noise precision parameter to recover . One key difference of our model compared to other blind CS work [19] is that each patch has a unique , which is inspired from our cameras since the mask is generated randomly and also takes account of the system calibration.
We model each dictionary atom as a draw from a Gaussian distribution,
(12) 
The coefficients are assumed drawn from the marginalized distribution
(13)  
where
denotes the inversegamma distribution. The parameter
is a “global” scaling for all coefficients of the patch, and is a “local” weight for the th coefficient of that patch. We place a gamma prior on , where one may set the hyperparameters to ensure that most are small. This encourages most to be small. By maximizing the log posterior to obtain a point estimate for the model parameters, one observes that the log of the prior in (IIIA) corresponds to adaptive Lasso regularization [33].Equivalently to (IIIA), the model for may be represented in the hierarchical form
(14)  
(15)  
(16)  
(17) 
where latent variables are included in the generative model, instead of marginalizing them out as in (IIIA). A vague/diffuse gamma prior is placed on the scaling parameters . Despite introducing the latent variables , the form in (15) is convenient for computation, and with an appropriate choice of , most are encouraged to be large. The large corresponds to small ; this model imposes that most are small, it imposes compressibility. Note that (14) is different from the model used in [34], where a single is used for all patches; here, we impose different compressibility for each patch by inferring a unique , thus providing flexibility.
In order to automatically infer the number of necessary dictionary atoms, we can replace (11) with
(18)  
(19)  
(20) 
The multiplicative gamma prior (MGP) [35] used above is developed to stochastically increase the precision as increases. During inference, we observe that as increases, tends to zero. This results in an approximate ordering of by weight. Based on this weight, we can infer the importance of the columns of . The other use of the ’s is to avoid overfitting during the learning procedure. We can update a fraction of by the weight of and discard the atoms with small weights to reduce the computational cost (refer to the Appendix for the inferred and learned dictionary ).
IiiB The Statistical Model
The full statistical model of the proposed blind CS approach is:
(21)  
(22)  
(23)  
(24)  
(25)  
(26)  
(27)  
(28) 
where broad priors are placed on the hyperparameters , i.e., . Note in (23), the coefficients are also scaled to the noise precision .
Let denote the parameters to be inferred. The log of the joint posterior may be expressed as:
(29)  
(30)  
(31) 
The terms in (29) are widely employed in optimizedbased dictionary learning [36, 32, 37]. The first term imposes an fit between the model and the observed data and the second term imposes an regularization on the dictionary atoms. The third term regularizes the relative importance of the aforementioned two terms in (29) via the weighting (updated during the inference). The terms in (30) impose both compressibility (via the globalshrinkage parameter and localshrinkage parameter ) and an regularization on . Finally, the terms in (31) impose both regularization and shrinkage on .
IiiC Related Models
The shrinkage manifested by (30) is the most distinct aspect of the proposed model relative to previous optimizationbased (and Bayesianbased [31, 18]) approaches. These other approaches effectively impose sparsity through regularization on . This is done by replacing terms in (30) with . To solve an optimizationbased analog to the proposed approach, one seeks to minimize the objective function
(32)  
the parameters and are typically set by hand (e.g., via crossvalidation). One advantage of the Bayesian framework is that we infer posterior distributions for and (in our model this is (), along with similar posterior estimates for all model parameters without crossvalidation. We also note that if we replace (30) by the spikeslab prior as used in [31], the model will reduce to BPFA. As opposed to this spikeslab prior, which imposes sparsity directly, our shrinkage prior imposes compressibility, and it is more appropriate to the high dimensional hyperspectral image patches. The globallocal shrinkage prior used here can extract more useful information from the limited dictionary atoms^{3}^{3}3We did experiments of denoising and inpainting with benchmark color images and compared with KSVD [32] and BPFA [31]. The results are shown in the appendix. The proposed algorithm constantly performs better than the above two methods..
Other forms of shrinkage priors like Gaussian scale model and Laplace scale mode can be found in [38, 39, 40, 41]. Aiming to better mimic the marginal behavior of discrete mixture priors, the globallocal shrinkage priors [22, 23] have been developed to offer sufficient flexibility in highdimensional settings, which inspires our model.
IiiD Inference
Due to local conjugacy, we can write the conditional posterior distribution for all parameters of our model in closed form, making the following Markov Chain Monte Carlo (MCMC) inference based on Gibbs sampling a straightforward procedure.

Sampling :
(33) 
Sampling :
(34) 
Sampling :
(35) 
Sampling :
(36) where denotes the inverseGaussian distribution.

Sampling :
(37) where is the generalized inverse Gaussian distribution
and is the modified Bessel function of the second kind

Sampling :
(38) where denotes the number of nonzero entries in , and “” refers to the conditioning parameters of the distributions.
The update equations of MGP related parameters can be found in [35]
(also listed in the Appendix). In applications where speed is important, we can use all conditional posteriors including those above to derive a variational Bayes (VB) inference algorithm for our model, which loosely amounts to replace conditioning variables with their corresponding moments. Details of the VB procedure can be found in the Appendix.
Scene  

Shrinkage w/o RGB  30.96735.0328  26.63195.5290  32.80134.1525  42.90045.4524  34.55325.1652 
Shrinkage w/ RGB  32.61754.0417  34.07754.3469  35.19502.9645  47.52754.7856  35.81822.0722 
TwIST  21.82382.9894  23.27875.8818  25.20303.4791  19.98743.7625  21.65662.2755 
Bregman  30.70274.6553  27.29075.4581  32.85953.8964  27.13904.0307  34.04484.5355 
IiiE RGB Images as Side Information
While reconstructing the hyperspectral images is challenging (the compression ratio is when a single measurement is available), an RGB image of the same scene can be obtained easily by an offtheshelf color camera in the unused path of the system (i.e. directly reflecting off the polarizing beamsplitter) as shown in Fig. 2. We here consider using the RGB image as side information to aid the reconstruction.
In the case of an additional side RGB camera, the measurement is a joint dataset composed of the RGB image and the CASSI measurement; the compression ratio can be considered as . The new measurement model is:
(39) 
where are the patch format RGB image. Considering each patch,
(40) 
where is the vectorized form of the RGB image patch corresponding to and similar for . The dictionary learning method proposed in Section IIIA is now performed on the joint dataset to learn the superdictionary , where .
Given , the proposed model learns and simultaneously. For the recovery of hyperspectral images, we select the top rows from the product of and . Averaging the overlapping patches and reformatting the data to 3D give us the desired hyperspectral images. We aim to get both clear images at each wavelength and correct spectra for each pixel, which are both embedded in the dictionary atoms. Since the RGB images are fully observed and embed the same scene as the hyperspectral images, this side information will facilitate the algorithm to learn a better dictionary and therefore improving the reconstruction quality.
Another way to use the RGB image is treating each R, G and B channel as a superposition of the hyperspectral images with different weights corresponding to the quantum efficiency of the R, G, and B sensors. However, these quantum efficiencies may be different for each camera. Here, we aim to propose a general and robust framework to collaborate RGB images with CASSI measurements. Therefore, the formulation in (39) is adopted.
In our experimental setup, a grayscale image may also be used as side information. As the RGB camera is inexpensive and carrying richer spectral information, we only consider the RGB case in the following experiments. In the results presented below, we consider that RGB image is measured separately by an additional camera. We are now modifying our SLMCASSI system to capture the RGB image and the compressed hyperspectral image simultaneously as illustrated in Fig. 2.
Iv Experimental Results
In this section, we verify the proposed blind CS inversion algorithm on diverse datasets. For now, we solely consider the case where a single measurement (i.e., the compressed hyperspectral image) is available^{4}^{4}4Please note that [18] did not show the algorithm proposed therein working with a single measurement. In this paper, we focus on the single measurement case and our model can readily be used in multiple measurements scenario. We also tried the method proposed in [18] and the results are worse than ours.. The proposed blind CS algorithm is compared with: ) twostep iterative shrinkage/thresholding (TwIST) [13] (using a 2D total variation norm on images at each wavelength^{5}^{5}5We also tried the 3D total variation (TV) on the entire datacube, the results are a little bit worse than the 2D TV. Therefore, we only show the 2D TV results of TwIST.), and ) the linearized Bregman algorithm [15]. The linearized Bregman approach respectively regularizes the spectral dimension and spatial dimensions with norms of the discrete cosine transformation and wavelet transformation coefficients. Note that TwIST and linearized Bregman are global algorithms; it is necessary to load the entire data into RAM when performing inversion. Conversely, our proposed method is locally(patch) based; the online learning is feasible and the inversion can be performed in batchmode [42].
We employ a spatial patch size . The compression ratio, , depends on the dataset. The proposed model can learn the dictionary atom number from the data. During experimentation, we have found that setting usually provides good results. We have verified that further increasing does not significantly change the outcome of our methods. All code used in the experiments was written in Matlab and executed on a 3.3GHz desktop with 16GB RAM.
We evaluate the proposed algorithm on synthetic data presented in Section IVA and the real data captured both by the CASSI [8, 17] and the proposed SLMCASSI camera Section IVB. We denote our method as “shrinkage” in the experiments (in figures and ta bles) as the shrinkage prior is used in our model. To evaluate the algorithm’s performance, we use the PSNR of the reconstructed images at different wavelengths and the correlation between the reconstructed spectrum and the true (reference) spectrum.
The computational time of our model is similar to the BPFA model used in [18] and linearized Bregman. TwIST provides faster, but generally worse results than our algorithm in terms of the metrics mentioned above. Quantitatively, linearized Bregman and TwIST require about 20 minutes and 14 minutes, respectively, to reconstruct the bird data (size ).
Iva Simulation Data
We use hyperspectral images encoded with a random binary (Bernoulli(0.5)) mask to simulate the CASSI measurements. The RGB images are available and aligned well with the hyperspectral images.
IvA1 Nature scenes
We first consider the hyperspectral images of natural scenes used in [43]^{6}^{6}6http://personalpages.manchester.ac.uk/staff/d.h.foster/Hyperspectral_images_of_natural_scenes_04.html. There are channels (400720nm with a 10nm interval) and we resize each image to
. The PSNR of each reconstructed spectral channel with mean values and standard deviations (across wavelength channels) are presented for each algorithm in Table
I. It can be seen that the proposed shrinkage method outperforms the others, especially once the RGB images are used as side information. Fig. 5 plots the reconstructed spectrum of selected blocks (the average spectrum of pixels inside the block is used) in the scene compared to the ground truth. It can be seen that: ) though TwIST provides the lowest PSNR, the reconstructed spectrum is usually correct; ) the spectrum reconstructed by the proposed algorithm is improved significantly when side information is provided; ) the linearized Bregman presents the worst spectrum, mainly due to the DCT used in the spectral domain for inversion.IvA2 Bird data
Next we consider the 24channel bird data (Fig. 6) measured by a hyperspectral camera in [18]. The RGB image is also available. Fig. 7 shows the reconstructed images using the proposed algorithm with and without side information. The left part of Fig. 8 plots the reconstruction PSNR of each channel using different algorithms. Again, our shrinkage blind CS method coupled with the RGB image provides the best result. The right part of Fig. 8 shows the reconstructed images of five selected channels using the different algorithms. It can be seen that the TwIST results are characterized by lost details (over smoothed) of the birds; our model’s results without using the RGB images appear noisy. Fig. 9 depicts the reconstructed spectra of different birds with different algorithms. It can be seen that both TwIST and our algorithm with the RGB image provide very good matches to the ground truth, while the proposed model without RGB images and linearized Bregman do not represent the spectrum well.
IvB Real Data
In this section, we apply the proposed algorithm to real data captured by our cameras (both the original CASSI camera and the new SLMCASSI camera).
IvB1 Object data
We first demonstrate our algorithm on data taken by the original CASSI system [8]^{7}^{7}7http://www.disp.duke.edu/projects/CASSI/experimentaldata/index.ptml. In these experiments, the reconstructions have 33 spectral channels (marked on the reconstruction in Fig. 10) and . There are 4 objects in the scene, a red apple, a yellow banana, a blue stapler and a green pineapple. Fig. 10 shows the reconstruction of the proposed algorithm without the RGB image. Since the RGB image is not wellaligned with the CASSI measurement, we don’t show the result with side information and we found the reconstruction with the RGB image (not aligned) looks similar to this one due to the simple scene used in this experiment. Fig. 11 compares selected reconstructed images inverted by different algorithms. It can be seen that the proposed algorithm provides more detail of the scene (notice the clear apple stem). Fig. 12 plots the reconstructed spectra of the four objects. TwIST [8] has yielded accurate spectra; our algorithm (without side information) provides similar results. The linearized Bregman algorithm does not reconstruct this real data well; we don’t not show its results in the following experiments.
IvB2 Bird data
We again consider the bird data, now captured by the multiframeCASSI camera [17]. The RGB image is aligned manually with the CASSI measurement (). We plot the spectra in Fig. 13; the reconstructed images are shown in Fig. 14. It can be seen clearly that our proposed method with side information provides the best results with respect to both image clarity and spectral accuracy. Without the use of side information, the proposed model yields clear images but reconstructs the spectra poorly. This verifies the benefit of the side information in real data.
IvB3 Data with CASSISLM
Now we consider the dataset captured by the proposed camera. Since no RGB image is available, we only show the results of our algorithm without side information. Fig. 15 shows the reconstructed spectrum and selected frames for the M&M dataset ()^{8}^{8}8The 30 wavelengths are 450nm, 458nm, 465nm, 473nm, 481.5nm, 489.5nm, 498nm, 507nm, 516nm, 524.5nm, 532.5nm, 540.5nm, 548.5nm, 556,5nm, 564.5nm, 572.5nm, 580.5nm, 588.5nm, 596nm, 603.5nm, 611nm, 618.5nm, 625.8nm, 633.5nm, 641nm, 648.5nm, 656nm, 663.5nm, 671nm, 678.5nm.. As with the original CASSI experiments, we use a fiber optic spectrometer (USB2000, Ocean Optics) to provide the reference spectra for the targets. It can be seen again our algorithm provides better results than TwIST, both for images and spectrum.
As a last example, we show the reconstructed images with our method for the berry data () in Figure 16. Notice that the leaf reconstructs are prevalent in the green channels (520590nm), while the berries (red) appear in the red portion of the spectrum (610680nm).
V Conclusions
We have developed and tested a new Bayesian dictionary learning model for blind CS. Specifically, we have demonstrated highquality inversion of compressed hyperspectral images captured by real cameras. Via the globallocal shrinkage priors, our algorithm imposes compressibility on the dictionary coefficients to extract more information from the dictionary atoms. The reconstruction quality is improved significantly by integrating the compressed measurements with RGB images as side information under the blind CS framework. We also developed a new compressive hyperspectral imaging camera that uses an SLM to perform the spectral coding. Experimental results demonstrate the feasibility of the new camera, the superior performance of the algorithm, and the benefit of side information.
The original CASSI camera utilizes a separate disperser and encoder; the SLMCASSI is capable of modulating both dimensions with one element. The SLMCASSI also provides the potential of video rate compressive sensing and multiframes adaptive sensing using the 60 Hz refresh rate of the SLM. SLMCASSI offers greater flexibility of masking functions and an additional RGB camera for side information; however, it intertwines spectral and spatial multiplexing capabilities. The coded aperture/disperser architecture used in the original CASSI camera is more reliable, has a smaller form factor (i.e. fewer optical parts), and is less expensive, but lacks the coding flexibility and ability to use a separate camera for side information.
In this paper, we have shown that by using the RGB image as side information in our proposed blind CS framework, the reconstruction quality of hyperspectral images can be improved significantly. This framework can be generalized to other applications and algorithms. For instance, Gaussian mixture model based dictionary learning approaches
[44, 45, 46, Yang14GMM2, 47] that learn a union of subspaces can also benefit from side information.References
 [1] K. Hege, D. O. Connell, W. Johnson, S. Basty, and E. Dereniak, “Hyperspectral imaging for astronomy and space surveillance,” Proc. SPIE, vol. 5159, pp. 380–391, 2003.
 [2] A. F. H. Goetz, G. Vane, J. E. Solomon, and B. N. Rock, “Imaging spectrometry for earth remote sensing,” Science, vol. 228, pp. 1147–1153, 1985.
 [3] B. S. Sorg, B. J. Moeller, O. Donovan, Y. Cao, and M. W. Dewhirst, “Hyperspectral imaging of hemoglobin saturation in tumor microvasculature and tumor hypoxia development,” J. Biomed. Opt., vol. 10, p. 044004, 2005.
 [4] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, April 2006.
 [5] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Transactions on Information Theory, vol. 52, no. 2, pp. 489–509, February 2006.
 [6] M. E. Gehm, R. John, D. J. Brady, R. M. Willett, and T. J. Schulz, “Singleshot compressive spectral imaging with a dualdisperser architecture,” Optics Express, vol. 15, pp. 14 013–14 027, 2007.
 [7] A. Wagadarikar, R. John, R. Willett, and D. J. Brady, “Single disperser design for coded aperture snapshot spectral imaging,” Applied Optics, vol. 47, no. 10, pp. B44–B51, 2008.
 [8] A. Wagadarikar, N. Pitsianis, X. Sun, and D. Brady, “Video rate spectral imaging using a coded aperture snapshot spectral imager,” Optics Express, vol. 17, no. 8, pp. 6368–6388, 2009.
 [9] C. Li, T. Sun, K. F. Kelly, and Y. Zhang, “A compressive sensing and unmixing scheme for hyperspectral data processing.” IEEE Transactions on Image Processing, vol. 21, no. 3, pp. 1200–1210, 2012.
 [10] B. S. Sorg, B. J. Moeller, O. Donovan, Y. Cao, and M. W. Dewhirst, “Hyperspectral imaging of hemoglobin saturation in tumor microvasculature and tumor hypoxia development,” J. Biomed. Opt., vol. 10, p. 044004, 2005.
 [11] E. Herrala, J. T. Okkonen, T. S. Hyvarinen, M. Aikio, and J. Lammasniemi, “Imaging spectrometer for process industry application,” Proc. SPIE, vol. 2248, pp. 33–40, 1994.
 [12] H. Morris, C. Hoyt, and P. Treado, “Imaging spectrometers for fluorescence and raman microscopy: acoustooptic and liquid crystal tunable filters,” Appl. Opt., vol. 34, pp. 4817–4826, 1995.
 [13] J. BioucasDias and M. Figueiredo, “A new TwIST: Twostep iterative shrinkage/thresholding algorithms for image restoration,” IEEE Transactions on Image Processing, vol. 16, no. 12, pp. 2992–3004, December 2007.
 [14] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, “Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems,” IEEE Journal of Selected Topics in Signal Processing, Tech. Rep., 2007.
 [15] W. Yin, S. Osher, D. Goldfarb, and J. Darbon, “Bregman iterative algorithms for minimization with applications to compressed sensing,” SIAM J. Imaging Sci, pp. 143–168, 2008.

[16]
X. Yuan, P. Llull, X. Liao, J. Yang, G. Sapiro, D. J. Brady, and L. Carin,
“Lowcost compressive sensing for color video and depth,” in
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, 2014.  [17] D. Kittle, K. Choi, A. Wagadarikar, and D. J. Brady, “Multiframe image estimation for coded aperture snapshot spectral imagers,” Applied Optics, vol. 49, no. 36, pp. 6824–6833, December 2010.
 [18] A. Rajwade, D. Kittle, T.H. Tsai, D. Brady, and L. Carin, “Coded hyperspectral imaging and blind compressive sensing,” SIAM Journal on Image Science, vol. 6, no. 2, pp. 782–812, 2013.
 [19] S. Gleichman and Y. C. Eldar, “Blind compressed sensing,” IEEE Trans. Inf. Theory, vol. 57, no. 10, pp. 6958–6975, 2011.
 [20] J. M. DuarteCarvajalino, G. Yu, L. Carin, and G. Sapiro, “Taskdriven adaptive statistical compressive sensing of Gaussian mixture models,” IEEE Transactions on Signal Processing, vol. 61, no. 3, pp. 585–600, February 2013.
 [21] V. Cevher, “Learning with compressible priors,” Advances in Neural Information Processing Systems (NIPS), pp. 261–269, 2009.
 [22] N. G. Polson and J. G. Scott, “Shrink globally, act locally: Sparse bayesian regularization and prediction,” Bayesian Statistics, 2010.
 [23] ——, “Local shrinkage rules, lévy processes and regularized regression,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 74, no. 2, pp. 287–311, 2012.
 [24] X. Yuan, V. Rao, S. Han, and L. Carin, “Hierarchical infinite divisibility for multiscale shrinkage,” IEEE Transactions on Signal Processing, vol. 62, no. 17, pp. 4363–4374, Sep. 1 2014.
 [25] Y. Wu, I. O. Mirza, G. R. Arce, and D. W. Prather, “Development of a digitalmicromirrordevicebased multishot snapshot spectral imaging system,” Opt. Lett., vol. 36, pp. 2692–2694, 2011.
 [26] G. Lazarev, A. Hermerschmidt, S. Kruger, and S. Osten, Optical Imaging and Metrology:Advanced Technologies. WileyVCH Verlag GmbH & Co. KGaA, Weinheim, Germeny, 2012.
 [27] M. E. Gehm and J. Kinast, “Adaptive spectroscopy: towards adaptive spectral imaging,” Proc. SPIE, vol. 6978, p. 69780I, 2008.
 [28] J. M. DuarteCarvajalino and G. Sapiro, “Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization,” IEEE Transactions on Image Processing, vol. 18, no. 7, pp. 1395–1408, July 2009.
 [29] S. D. A. Averbuch and S. Deutsch, “Adaptive compressed image sensing using dictionaries,” SIAM Journal on Imaging Sciences, vol. 5, no. 1, pp. 57–89, 2012.
 [30] W. Carson, M. Chen, M. Rodrigues, R. Calderbank, and L. Carin, “Communications inspired projection design with application to compressive sensing,” SIAM Journal on Imaging Sciences, vol. 5, no. 4, pp. 1185–1212, 2012.
 [31] M. Zhou, H. Chen, J. Paisley, L. Ren, L. Li, Z. Xing, D. Dunson, G. Sapiro, and L. Carin, “Nonparametric Bayesian dictionary learning for analysis of noisy and incomplete images,” IEEE Transactions on Image Processing, vol. 21, no. 1, pp. 130–144, January 2012.
 [32] J. Mairal, M. Elad, and G. Sapiro, “Sparse representation for color image restoration,” IEEE Transactions on Image Processing, vol. 17, pp. 53–69, January 2008.
 [33] H. Zou, “The adaptive lasso and its oracle properties,” Journal of the American Statistical Association, vol. 101, no. 476, pp. 1418–1429, Dec. 2006.
 [34] Z. Xing, M. Zhou, A. Castrodad, G. Sapiro, and L. Carin, “Dictionary learning for noisy and incomplete hyperspectral images,” SIAM Journal on Imaging Sciences, vol. 5, no. 1, pp. 33–56, Jan. 2012.
 [35] A. Bhattacharya and D. B. Dunson, “Sparse bayesian infinite factor models,” Biometrika, vol. 98, no. 2, pp. 291––306, 2011.
 [36] M. Aharon, M. Elad, and A. Bruckstein, “KSVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006.
 [37] M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Transactions on Image Processing, vol. 15, pp. 3736––3745, December 2006.
 [38] A. Armagan, M. Clyde, and D. B. Dunson, “Generalized beta mixtures of gaussians,” Advances in Neural Information Processing Systems (NIPS), 2011.
 [39] C. M. Carvalho, N. G. Polson, and J. G. Scott, “The horseshoe estimator for sparse signals,” Biometrika, vol. 97, no. 2, pp. 465–480, 2010.
 [40] P. Garrigues and B. A. Olshausen, “Group sparse coding with a laplacian scale mixture prior,” Advances in Neural Information Processing Systems (NIPS), pp. 676–684, 2010.
 [41] J. E. Griffin and P. J. Brown, “Bayesian hyperlassos with nonconvex penalization,” Australian New Zealand Journal of Statistics, vol. 53, no. 4, pp. 423–442, 2011.

[42]
J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online dictionary learning for
sparse coding,” in
Proceedings of the 26th Annual International Conference on Machine Learning (ICML)
, 2009, pp. 689–696.  [43] A. K. N. S. Foster, D.H. and M. Foster, “Frequency of metamerism in natural scenes,” Journal of the Optical Society of America A, vol. 23, no. 12, pp. 2359–2372, 2006.
 [44] M. Chen, J. Silva, J. Paisley, C. Wang, D. Dunson, and L. Carin, “Compressive sensing on manifolds using a nonparametric mixture of factor analyzers: Algorithm and performance bounds,” IEEE Transactions on Signal Processing, vol. 58, no. 12, pp. 6140–6155, December 2010.
 [45] J. Yang, X. Yuan, X. Liao, P. Llull, G. Sapiro, D. J. Brady, and L. Carin, “Video compressive sensing using Gaussian mixture models,” IEEE Transaction on Image Processing, vol. 23, no. 11, pp. 4863–4878, November 2014.
 [46] G. Yu and G. Sapiro, “Statistical compressed sensing of Gaussian mixture models,” IEEE Transactions on Signal Processing, vol. 59, no. 12, pp. 5842–5858, 2011.
 [47] G. Yu, G. Sapiro, and S. Mallat, “Solving inverse problems with piecewise linear estimators: From Gaussian mixture models to structured sparsity,” IEEE Transactions on Image Processing, 2012.

[48]
M. J. Beal, “Variational algorithms for approximate bayesian inference,”
Ph.D. dissertation, University College London, 2003.  [49] S. Kullback and R. A. Leibler, “On information and sufficiency,” The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79––86, 1951.
Vi Model and Inference
The full statistical model is:
(41)  
(42)  
(43)  
(44)  
(45)  
(46)  
(47)  
(48)  
(49)  
(50)  
(51)  
(52) 
The posterior density function of model parameters may be represented as:
(53)  
Via MCMC Inference

Sampling :
(54) (55) (56) (57) 
Sampling :
(58) (59) (60) 
Sampling :
(61) 
Sampling :
(62) where denotes inverseGaussian distribution.

Sampling :
(63) where is the generalized inverse Gaussian distribution:
and is the modified Bessel function of the second kind

Sampling :
(64) (65) (66) where denotes the number of nonzero entries in .

Sampling :
(67) (68)
Comments
There are no comments yet.