”mapping” an algorithm o nto a PE, particularly while using
heteroge neous hardware p la tforms. This in turn highlights the
need for quic k ”mapping” exploration at a high abstra c tion
level, presu mably at compo nent level, for reducing the time
needed for performing the exploration. We have derived
equations that acc urately estimates the cycles which could
indeed can be used by tools to perform mapping exploration
automatically.
VII. CONCLUSIONS AND OUTLOOK
In this pap e r, we have presented the analysis on flexibility
vs. efficiency tradeoffs that need to be made while developing
SDRs by using the computation in te nsive MMSE-SQRD,
which is widely used in MIM O receivers, as a case study. Two
algorithm s for performing SQRD were investigated and effi-
ciently implemented in several versions. Flexible versions of
implementations with varying degree of reusability, portability
and efficiency have bee n implemented . Though the dedicated
implementations differ in processing time by good margin
when compared to flexible implementations, efficient imple-
mentations with reusable a lgorithms can still provide flexibility
and can be used in scenarios with tight constraints, like
latency. Accurate equations which can be used for performing
constraint-aware mapping at a high abstraction level with
tool-assistance have been derived. As outlook, the analysis
of flexibility vs. efficiency tra deoffs can be continued b y
implementing more computation-intensive wirele ss p hysical
layer algorithms on different hardware architectures.
REFERENCES
[1] V. Ramakrishnan et al., “Efficient and portable SDR waveform develop-
ment: The Nucleus concept,” in Proc. IEEE Military Communications
Conf. MILCOM 2009, 2009, pp. 1–7.
[2] G. Ghosh et al., Fundamentals of LTE, 1st ed. Prentice Hall, August
2010.
[3] J. Valls et al., “The use of cordic in software defined radios: a tutorial,”
Communications Magazine, IEEE, vol. 44, no. 9, pp. 46 –50, 2006.
[4] Y. Wang et al., “Parallel MIMO detection algorithm based on house-
holder transformation,” in Proc. Int. Symp. Intelligent Signal Processing
and Communication Systems ISPACS 2007, 2007, pp. 180–183.
[5] K.-L. Chung et al., “The complex Householder transform,” vol. 45, no. 9,
pp. 2374–2376, 1997.
[6] C. K. Singh et al., “VLSI Architecture for Matrix Inversion using
Modified Gram-Schmidt based QR Decomposition,” in Proc. th Int VLSI
Design Held jointly with 6th Int. Conf. Embedded Systems. Conf, 2007,
pp. 836–841.
[7] C. K. Singh et al., “A Fixed-Point Implementation for QR Decom-
position,” in Proc. IEEE Dallas/CAS Workshop Design, Applications,
Integration and Software, 2006, pp. 75–78.
[8] R.-H. Lai et al., “A modified sorted-QR decomposition algorithm for
parallel processing in MIMO detection,” in Proc. IEEE Int. Symp.
Circuits and Systems ISCAS 2009, 2009, pp. 1405–1408.
[9] D. Wubben et al., “MMSE extension of V-BLAST based on sorted QR
decomposition,” in P roc. VTC 2003-Fall Vehicular Technology Conf.
2003 IEEE 58th, vol. 1, 2003, pp. 508–512.
[10] N. W. Gabriel L. Nazar, Christina Gimmler, “Implementation Compar-
isons of the QR decomposition for MIMO Detectionson,” 2010.
[11] C. Studer et al., “Matrix Decomposition Architecture for MIMO Sys-
tems: Design and Implementation Trade-offs,” in Conference Record of
the Forty-First Asilomar Conference on Signals, Systems and Computers
ACSSC 2007, P. Blosch, Ed., November 2007, pp. 1986–1990.
[12] I. LaRoche et al., “An efficient regular matrix inversion circuit archi-
tecture for MIMO processing,” in Proc. IEEE Int. Symp. Circuits and
Systems ISCAS 2006, 2006.
[13] Z.-Y. Huang et al., “High-throughput QR decomposition for MIMO
detection in OFDM systems,” in Proceedings of 2010 IEEE International
Symposium on Circuits and Systems (ISCAS), May 2010, pp. 1492 –
1495.
[14] P. Luethi et al., “VLSI Implementation of a H igh-Speed Iterative
Sorted MMSE QR Decomposition,” in IEEE International Symposium
on Circuits and Systems, 2007. ISCAS 2007., May 2007, pp. 1421–1424.
[15] K.-H. Lin et al., “Implementation of QR decomposition for MIMO-
OFDM detection systems,” in 15th IEEE International Conference on
Electronics, Circuits and Systems, 2008. ICECS 2008., 2008, pp. 57–60.
[16] P. Luethi et al., “Gram-Schmidt-based QR decomposition for MIMO
detection: VLSI implementation and comparison,” in Proc. IEEE Asia
Pacific Conf. Circuits and Systems APCCAS 2008, 2008, pp. 830–833.
[17] K. Mohammed et al., “A MIMO Decoder Accelerator for Next Gener-
ation Wireless Communications,” no. 99, p. 1, 2009, early Access.
[18] M. S. Khairy et al., “Efficient FPGA Implementation of MIMO Decoder
for Mobile WiMAX System,” in Proc. IEEE Int. Conf. Communications
ICC ’09, 2009, pp. 1–5.
[19] J. Eilert et al., “Efficient Complex Matrix Inversion for MIMO Software
Defined Radio,” in P roc. IEEE Int. Symp. Circuits and Systems ISCAS
2007, 2007, pp. 2610–2613.
[20] A. Irturk et al., “Architectural Optimization of Decomposition Algo-
rithms for Wireless Communication Systems,” in Proc. IEEE Wireless
Communications and Networking Conf. WCNC 2009, 2009, pp. 1–6.
[21] Z. Nikolic et al., “Design and implementation of numerical linear algebra
algorithms on fixed point DSPs,” EURASIP J. Adv. Signal Process, vol.
2007, no. 2, pp. 13–13, 2007.
[22] T. Haustein et al., “Real-time signal processing for multiantenna sys-
tems: algorithms, optimization, and implementation on an experimental
test-bed,” EURASIP J. Appl. Signal Process., vol. 2006, pp. 136–136,
2005.
[23] A. Bjoerck, “Numerics of Gram-Schmidt orthogonalization,” Linear
Algebra and its Applications, vol. 197-198, pp. 297 – 316, 1994.
[24] TMS320C64x+ IQmath Library User’s Guide, Texas Instuments, De-
cember 2008.
47