Chapter 3 Image Reconstruction Algorithm and Processor
3.3. The result of software simulation and analysis
Figure 3.2 is the simulation flow chart of simulation. MATLAB is used to handle the simulation. First, the geometry of sample is defined by the medium size and the locations of the source-detector pairs. Totally six diffuse optical sources and twelve photo-detectors are located and it demonstrated in figure 3.3. The area of frame is 4cm×6cmand the voxel size is (0.25cm)3. Second, the medium optical coefficients
Figure 3.1 The relation between sub-frames and one frame.
are assigned to each voxel. The background medium is homogenous with 0.05 1
h
a cm
μ = − and the reduced scattering coefficient μs' =10cm−1.
Figure 3.3 The distribution of sources and detectors.
Figure 3.2 The flow chart of simulation of forward model and inverse solution.
T
F
mean square error (MSE) which is in Eq. (30) and computational time in two modes for the first and second medium are computed.
(( true( ) recon( )) )2
i n a a
MSE=mean∈ μ i −μ i (30)
The result is shown in table 6 and table 7. The MSE in the sub-frame mode is bigger than the frame mode in most cases for the first medium. However, the visual quality may not be significantly improved especially when the truncated number is set to be 4. In addition, the computational time of the frame mode is about two hundred times more than the sub-frame mode. The large cost of time consumption of the frame mode is due to large matrix to solve by the iterative JSVD algorithm. It can also be observed that decreasing the truncated parameter leads to less computational time.
For the second medium, surprisingly the MSE of the sub-frame mode is always better than that of the frame mode. It can also be demonstrated that the image quality of the sub-frame mode is better. The main reason is the location of anomaly medium is right at the center of a sub-frame. It is not intended to claim that the sub-frame mode is definitely superior to the frame mode. However, simulation results clearly demonstrate that, with the inherent limitation on image resolution in CW-DOT systems, the sub-frame mode is a good technique to dramatically reduce the computational cost and also maintain a reasonably good reconstruction quality.
Table 6 Compute time and MSE of Frame Mode Frame Mode First Medium Second Medium
Truncate Parameter
Compute time (sec)
MSE (x10-4)
Compute time (sec)
MSE (x10-4)
24 5.780064 86 6.326128 176
18 5.687689 80 5.507522 129
12 5.456757 66 5.318873 81
6 5.421226 178 5.372787 81
3.4 The design of the JSVD processor
The processor is used to solve the inverse solution, so the speed of the processor is not the main requirement. The processor is biomedical signal processing and implemented on portable instruments. Therefore, the main goals of the processor is high precision, low area and low power consumption. The fix-point simulation is done by MATLAB and the hardware resource is designed aim to reach the criteria.
3.4.1 The architecture of processor
The processor contains four important blocks which are CORDIC engines, the control unit of CORDIC engine, memory control unit and dual port memories. The simplified, general block diagram of architecture is showed in figure 3.9.
Table 7 Compute time and MSE of Sub-frame Mode Sub-Frame
Mode First Medium Second Medium
Truncate Parameter
Compute time (sec)
MSE (x10-4)
Compute time (sec)
MSE (x10-4)
4 0.034519 99 0.034079 78
3 0.034326 149 0.040209 78
2 0.034536 202 0.033942 78
1 0.034241 252 0.040864 78
Figure 3.9 The simplified, general bock diagram of architecture.
CORDIC Engine is implement with the basic structure[25], even there are other structure faster the basic structure such as radix-2 CORDIC but the area is bigger and speed is not the requirement. The basic architecture of CORDIC engine is presented in figure 3.10.
The 2-by-2 SVD is performed by the parallel diagonalization SVD method which is based on determining θsum and θdiff directly. This method can reduce the necessary of computation time and the area of 2-by-2 SVD. Therefore there are two CORDIC engines to do the parallel computation.
The algorithm of parallel diagonalization SVD:
Begin
Parallel do : b+c, c-b, d-a, d+a Parallel do begin
Find (θsum= θ θr+ l); Find θdiff =(θr−θl); End
Parallel do separate θ θ r, l
Parallel find sine/cosine of θ θ using the CORDIC engine r, l
End
The algorithm can be divided into four stages to implement on circuits and the
Figure 3.10 The basic architecture of CORDIC engine.
two CORDIC engines can be reused to obtain the result of SVD. Table 8 shows the input and output of each stage. The SVD control unit is designed to be a finite state machine and execute each stage by reuse the two CORDIC engines.
3.4.2 Memory controller
The module is used to generate the 2-by-2 matrix from big matrix and renew the memory by multiply the old A with ( , ,i J p q θr)and ( , , )J p qθl . In order to access the memory less and reduce the time of refresh memory; the data flow is scheduled to
Table 8 The input and output of each stage.
CORDIC Engine 1 CORDIC Engine 2
Stage1
be parallel and efficient. We observe the elements of J p q( , ,θr) and ( , , )J p qθl and we can found that the location of non-zero value is relate to set( , )p q . Also, the valued of the diagonal elements are one. Hence, the change of matrix A is only the element of ( , )p q rows and columns.
In the reason, we can renew the data which are changed and remain the data which are still same. After analyzing the changed date, they are divided into three types. First type is the column and row of location are include( , )p q , which are in green circles in figure 3.11. The second and third types are the location of column or row relative to set( , )p q which are in red circles. The categories of these data are
After know the characteristics of these data, the multipliers are designed to reuse and execute parallel the calculation. Before computing the data, the product of
cosθr, sinθr, cosθl, sinθl are dealt first, the result of their product is x, y, z, and s.
The schedule of the data flow to renew the A is presented in figure 3.12. i Figure 3.11 The example of different data types.
The renew procedure of U and i V are also deal with the same methodology i and shares the other four multipliers. By the method we can renew two elements of matrix parallel and access the memories efficient.
Figure 3.13 The second renew flow of the data input.
3.4.3 The spec of the JSVD processor
The circuit is implemented to hardware description language in Verilog.
Although the main goal of the processor is not the speed, the speed of the processor can reach 200MHz. The total cell area is 248180 with the UMC 90nm manufacture library. The combinational circuit area is 102804 cell and the non-combinational area is 145376 cell. The result is synthesized by the ncverilog of Synopsys. The fix-point JSVD can decompose the 16-by-16 matrix and offer 14bits precision CORDIC engine. It also offers the restriction of iteration times. To deal with an iteration of 4x16 matrix only take 160μ . s
Conclusion
In this paper, a portable CW-DOT system of prototype is proposed. The system was used to verify the reconstruction algorithm of forward model and inverse solution.
In order to reduce the complexity of computation and enhance the quality of the CW-DOT images, we apply Truncated and Jacobi SVD algorithm to do the inverse solution in CW-DOT systems.
The different reconstruction modes are also provided: sub-frame mode and frame mode. Sub-frame mode has been proven can reduce the computational overhead in reconstruction processing. We simulate inhomogeneous media with different shapes and locations and study the impact of different reconstruction modes on the quality of image. This simulation demonstrates that low computational cost is possible without harming severely the image quality. In short, Truncated and Jacobi SVD is a highly efficient technique for reconstruction of good quality images and is suitable for CW-DOT system.
Moreover, the high precision 16-bits image reconstruction processor is also proposed. The design of the processor is aim to low-area, low-power consumption and reasonable precision. In conclusion, the study reduces the volume of CW-DOT systems by implementing the low computational overhead algorithm on VLSI. The algorithm is also tested by simulation and emulation with the prototype.
In the future, the image reconstruction processor can be combined with other biomedical signal processors to be a SoC, and operate with the wireless handheld devises. In this way, he devises can benefit more doctors, more patients, and more researchers.
Reference
[1] W.-C. Kuo, "非侵入式生醫斷層影像簡介," 物理雙月刊, vol. 28, pp.
698-703, 2006.
[2] Aaron G. Filler, "The History, Development and Impact of Computed Imaging in Neurological Diagnosis and Neurosurgery: CT, MRI, and DTI," 2009.
[3] J. Ollinger and J. Fessler, "Positron-emission tomography," IEEE Signal Processing Magazine, vol. 14, pp. 43-55, 1997.
[4] A. Fercher, W. Drexler, C. Hitzenberger, and T. Lasser, "Optical coherence tomography-principles and applications," Reports on progress in physics, vol.
66, pp. 239-303, 2003.
[5] S. Arridge and M. Schweiger, "Image reconstruction in optical tomography,"
Philosophical Transactions: Biological Sciences, vol. 352, pp. 717-726, 1997.
[6] M. Schweiger, "Computational aspects of diffuse optical tomography," in Computing in Science & Engineering. vol. 5, A. Gibson, Ed., 2003, pp. 33-41.
[7] V. Kondepati, H. Heise, and J. Backhaus, "Recent applications of near-infrared spectroscopy in cancer diagnosis and therapy," Analytical and Bioanalytical Chemistry, vol. 390, pp. 125-139, 2008.
[8] T. M. Benson, "Modified simultaneous iterative reconstruction technique for faster parallel computation," in Nuclear Science Symposium Conference Record, 2005 IEEE. vol. 5, J. Gregor, Ed., 2005, pp. 2715-2718.
[9] M. Jiang and G. Wang, "Convergence of the simultaneous algebraic reconstruction technique (SART)," IEEE Transactions on Image Processing, vol. 12, pp. 957-961, 2003.
[10] R. J. Gaudette, D. H. Brooks, C. A. DiMarzio, M. E. Kilmer, E. L. Miller, T.
Gaudette, and D. A. Boas, "A comparison study of linear reconstruction techniques for diffuse optical tomographic imaging of absorption coefficient,"
Physics in Medicine and Biology, vol. 45, pp. 1051-1051, 2000.
[11] C. Sau-Gee and C. Chin-Chi, "A new efficient algorithm for singular value decomposition," in Circuits and Systems, 1999. ISCAS '99. Proceedings of the 1999 IEEE International Symposium on, 1999, pp. 523-526 vol.5.
[12] G. Strangman, D. A. Boas, and J. P. Sutton, "Non-invasive neuroimaging using near-infrared light," Biological Psychiatry, vol. 52, pp. 679-693, 10/1 2002.
[13] A. Bozkurt, A. Rosen, H. Rosen, and B. Onaral, "A portable near infrared spectroscopy system for bedside monitoring of newborn brain," BioMedical Engineering OnLine, vol. 4, p. 29, 2005.
[14] S. C. Bunce, M. Izzetoglu, K. Izzetoglu, B. Onaral, and K. Pourrezaei,
"Functional near-infrared spectroscopy," Engineering in Medicine and Biology Magazine, IEEE, vol. 25, pp. 54-62, 2006.
[15] R. X. Xu and S. P. Povoski, "Diffuse optical imaging and spectroscopy for cancer," Expert Review of Medical Devices, vol. 4, pp. 83-95, 2007.
[16] W. F. Cheong, "A review of the optical properties of biological tissues," in Quantum Electronics, IEEE Journal of. vol. 26, S. A. Prahl, Ed., 1990, pp.
2166-2185.
[17] C. Schmitz, D. Klemer, R. Hardin, M. Katz, Y. Pei, H. Graber, M. Levin, R.
Levina, N. Franco, and W. Solomon, "Design and implementation of dynamic near-infrared optical tomographic imaging instrumentation for simultaneous dual-breast measurements," Applied Optics, vol. 44, pp. 2140-2153, 2005.
[18] D. A. Boas, D. H. Brooks, E. L. Miller, C. A. DiMarzio, M. Kilmer, R. J.
Gaudette, and Z. Quan, "Imaging the body with diffuse optical tomography,"
Signal Processing Magazine, IEEE, vol. 18, pp. 57-75, 2001.
[19] D. Boas, T. Gaudette, and S. Arridge, "Simultaneous imaging and optode calibration with diffuse optical tomography," Optics Express, vol. 8, pp.
263-270, 2001.
[20] A. Ribes and F. Schmitt, "Linear inverse problems in imaging," in Signal Processing Magazine, IEEE. vol. 25, 2008, pp. 84-99.
[21] U. Schwiegelshohn and L. Thiele, "A systolic algorithm for cyclic-by-rows SVD," in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '87., 1987, pp. 768-770.
[22] R. Andraka, "A survey of CORDIC algorithms for FPGA based computers,"
1998, pp. 191-200.
[23] M. Rahmati, M. S. Sadri, and M. A. Naeini, "FPGA based singular value decomposition for image processing applications," in Application-Specific Systems, Architectures and Processors, 2008. ASAP 2008. International Conference on, 2008, pp. 185-190.
[24] W. Ma, M. E. Kaye, D. M. Luke, and R. Doraiswami, "An FPGA-Based Singular Value Decomposition Processor," in Electrical and Computer Engineering, Canadian Conference on, 2006, pp. 1047-1050.
[25] J. Cavallaro and F. Luk, "CORDIC Arithmetic for an SVD Processor," Journal of parallel and distributed computing, vol. 5, pp. 271-290, 1988.