Digital Signal Processing Moslem Amiri, Václav Přenosil Masaryk University Understanding Digital Signal Processing, Third Edition, Richard Lyons (0-13-261480-4) © Pearson Education, 2011. The Fast Fourier Transform 2 Relationship of FFT to DFT  Radix-2 FFT algorithm  A very efficient process for performing DFTs under constraint that DFT size be an integral power of two  Radix-2 FFT greatly reduces the number of necessary arithmetic operations  The number of complex multiplications necessary for an N-point DFT is N2  The number of complex multiplications for an Npoint FFT is approximately (N/2)log2N ∑ − = − = 1 0 /2 )()( N n Nmnj enxmX π 3 Relationship of FFT to DFT 4 Relationship of FFT to DFT  FFT is not an approximation of DFT  It’s exactly equal to DFT  All of performance characteristics of DFT, output symmetry, linearity, output magnitudes, leakage, scalloping loss, etc., also describe the behavior of FFT 5 Hints on Using FFTs in Practice  Sample fast enough and long enough  Sampling rate must be greater than twice the bandwidth of continuous A/D input signal  Sample at 2.5 to 4 times the signal bandwidth  If we don’t know signal’s bandwidth, we should mistrust any FFT results that have significant spectral components at frequencies near half fs  Be suspicious of aliasing if there are any spectral components whose frequencies depend on fs  If we suspect that aliasing is occurring or continuous signal contains broadband noise, we’ll have to use an analog lowpass filter prior to A/D conversion  Cutoff frequency of lowpass filter must be greater than frequency band of interest but less than half fs 6 Hints on Using FFTs in Practice  Sample fast enough and long enough  How many samples must we collect  Data collection time interval must be long enough to satisfy desired FFT frequency resolution for given fs  Total data collection time interval is N/fs seconds, and N-point FFT bin-to-bin frequency resolution is fs/N Hz  For example, if we need a spectral resolution of 5 Hz, then fs/N = 5 Hz, and  If fs is, say, 10 kHz, then N must be at least 2000, and we’d choose N = 2048 because this number is a power of two s ss f ff N 2.0 5resolutiondesired === 7 Hints on Using FFTs in Practice  Manipulating time data prior to transformation  If length of time-domain data sequence is not an integral power of two, we have two options  Discard enough data samples so that remaining sequence length is some integral power of two  Not recommended  A better approach is to append enough zerovalued samples to the end of time data sequence to match the number of points of the next largest radix-2 FFT  Zero-padding technique 8 Hints on Using FFTs in Practice  Manipulating time data prior to transformation  We can multiply time data by a window function to alleviate leakage problem  But frequency resolution is degraded when windows are used  If appending zeros is necessary to extend a time sequence, append zeros after multiplying original time data sequence by a window function 9 Hints on Using FFTs in Practice  Manipulating time data prior to transformation  Even when windowing is employed, high-level spectral components can obscure nearby lowlevel spectral components  This is especially evident when original time data has a nonzero average, i.e., it’s riding on a DC bias  A large-amplitude DC spectral component at 0 Hz will overshadow its spectral neighbors  We can eliminate this problem by calculating average of time sequence and subtracting that average value from each sample in original sequence  The averaging and subtraction process must be performed before windowing 10 Hints on Using FFTs in Practice  Enhancing FFT results  To detect signal energy in presence of noise (enough time-domain data is available), we can improve sensitivity of processing by averaging multiple FFTs  A 2N-point real sequence can be transformed with a single N-point complex radix-2 FFT to speed up our processing  If we need FFT of unwindowed and also windowed time-domain data, we can perform FFT of unwindowed data, and then we can perform frequency-domain windowing to reduce spectral leakage on any, or all, of FFT bin outputs 11 Hints on Using FFTs in Practice  Interpreting FFT results  First step in interpreting FFT results is to compute absolute frequency of individual FFT bin centers  Like DFT, FFT bin spacing is fs/N  For m = 0, 1, 2, 3, . . ., N−1, absolute frequency of mth bin center is mfs/N  If FFT’s input time samples are real, only X(m) outputs from m = 0 to m = N/2 are independent  We need determine only absolute FFT bin frequencies for m over range of 0 ≤ m ≤ N/2  If FFT input samples are complex, all N of FFT outputs are independent, and we should compute absolute FFT bin frequencies for m over range of 0 ≤ m ≤ N−1 12 Hints on Using FFTs in Practice  Interpreting FFT results  We can determine true amplitude of time-domain signals from their FFT spectral results  Radix-2 FFT outputs are complex  FFT output magnitude samples are all inherently multiplied by factor N/2, when input samples are real  If FFT input samples are complex, scaling factor is N  So to determine correct amplitudes of time-domain sinusoidal components, divide FFT magnitudes by N/2 for real inputs and N for complex inputs )()()( real mjXmXmX imag+= 2 imag 2 realmag )()()()( mXmXmXmX +== 13 Hints on Using FFTs in Practice  Interpreting FFT results  If a window function was used on original timedomain data, some of FFT input samples will be attenuated  This reduces the resultant FFT output magnitudes from their true unwindowed values  To calculate correct amplitudes of various time-domain sinusoidal components, we have to further divide FFT magnitudes by appropriate processing loss factor associated with the window function used 14 Hints on Using FFTs in Practice  Interpreting FFT results  To determine power spectrum XPS(m)  Normalization through division by (|X(m)|max)2 or |X(m)|max eliminates effect of any absolute FFT scale factor (N or N/2) or window scale factor  No compensation need be performed         ⋅=         ⋅= ⋅= +== max 10 2 max 2 10 2 10 2 imag 2 real 2 )( )( log20)(normalized ))(( )( log10)(normalized dB))((log10)( )()()()( mX mX mX mX mX mX mXmX mXmXmXmX dB dB dB PS 15 Hints on Using FFTs in Practice  Interpreting FFT results  Phase angles Xø(m)  Our calculations (or compiler) should detect occurrences of Xreal(m) = 0 and set corresponding Xø(m) to 90° if Ximag(m) > 0, set Xø(m) to 0° if Ximag(m) = 0, and set Xø(m) to −90° if Ximag(m) < 0  FFT outputs containing significant noise components can cause large fluctuations in the computed Xø(m) phase angles  Xø(m) samples are meaningful when corresponding |X(m)| is well above average FFT output noise level         = − )( )( tan)( real 1 mX mX mX imag φ 16 Derivation of Radix-2 FFT Algorithm  where m is in range 0 to N/2−1  Index m has that reduced range because each of the two N/2-point DFTs on the right side are periodic in m with period N/2 ∑∑ ∑∑ ∑∑ ∑ − = − = == = − = − = = − = +− − = − − = − ++= → ++= → ++= = − − − 1)2/( 0 2/ 1)2/( 0 2/ 1)2/( 0 2 1)2/( 0 2 1)2/( 0 /)12(2 1)2/( 0 /)2(2 1 0 /2 )12()2( )12()2( )12()2( )()( 2/ )2//(2 )/(222 /2 N n mn N m N N n mn N We eW N n mn N m N N n mn N eW N n Nmnj N n Nmnj N n Nmnj WnxWWnx WnxWWnx enxenx enxmX N Nj Nj N Nj N π π π ππ π 17 Derivation of Radix-2 FFT Algorithm  We have two N/2 summations whose results can be combined to give the first N/2 samples of an N-point DFT  Benefits of breaking N-point DFT into two parts  Reduction of number crunching because W terms in the two summations are identical  Also the upper half of DFT outputs is easy to calculate ∑∑ − = − = = ++= → − 1)2/( 0 2/ 1)2/( 0 2/ )12()2()( )2//(2 2/ N n mn N m N N n mn N eW WnxWWnxmX Nj N π 18 Derivation of Radix-2 FFT Algorithm  We just change sign of twiddle factor and use results of the two summations from X(m) to get X(m+N/2)  m goes from 0 to (N/2)−1  To compute an N-point DFT, we actually perform two N/2-point DFTs—one N/2-point DFT on even-indexed and one N/2-point DFT on odd-indexed x(n) samples ∑∑ ∑∑ ∑∑ − = − = −+ −+ − = ++ − = + − = − = = +−=+ −=−=== → ==== ++=+ ++= → − 1)2/( 0 2/ 1)2/( 0 2/ 2/22/)2/(factortwiddle 2/2/ 2/22 2/ 2/ 2/2/ )2/( 2/ 1)2/( 0 )2/( 2/ )2/( 1)2/( 0 )2/( 2/ 1)2/( 0 2/ 1)2/( 0 2/ )12()2()2/( )1()( )1()( )12()2()2/( )12()2()( )2//(2 2/ N n mn N m N N n mn N m N m N NNjm N N N m N Nm N mn N mn N NNnjmn N Nn N mn N Nmn N N n Nmn N Nm N N n Nmn N N n mn N m N N n mn N eW WnxWWnxNmX WWeWWWW WWeWWWW WnxWWnxNmX WnxWWnxmX Nj N π π π 19 Derivation of Radix-2 FFT Algorithm ∑ ∑ ∑ ∑ − = − = − = − = +− =+ ++ = 1)2/( 0 2/ 1)2/( 0 2/ 1)2/( 0 2/ 1)2/( 0 2/ )12( )2()2/( )12( )2()( N n mn N m N N n mn N N n mn N m N N n mn N WnxW WnxNmX WnxW WnxmX 20 Derivation of Radix-2 FFT Algorithm  Twiddle factors  Because −e−j2πm/N = e−j2π(m+N/2)/N, negative W twiddle factors are implemented with positive W twiddle factors that follow the lower DFT in Fig. 4- 2 ∑∑ ∑∑ − = − = − = − = +−=+ ++= 1)2/( 0 2/ 1)2/( 0 2/ 1)2/( 0 2/ 1)2/( 0 2/ )12()2()2/( )12()2()( N n mn N m N N n mn N N n mn N m N N n mn N WnxWWnxNmX WnxWWnxmX 21 Derivation of Radix-2 FFT Algorithm ∑∑ ∑∑ ∑∑ ∑ ∑∑ ∑∑ − = − = − = − = = − = + − = − = − = − = − = − = +++= ++= → ++= = −=+ → += → +−=+ ++= 1)4/( 0 4/2/ 1)4/( 0 4/ 1)4/( 0 4/2/ 1)4/( 0 4/ 1)4/( 0 )12( 2/ 1)4/( 0 2 2/ 1)2/( 0 2/ tionsimplifica tionsimplifica 1)2/( 0 2/ 1)2/( 0 2/ 1)2/( 0 2/ 1)2/( 0 2/ )34()14()( )24()4()( )24()4( )2()( )()()2/( )()()( )12()2()2/( )12()2()( 4/ 2 2/ N n mn N m N N n mn N N n mn N m N N n mn N WW N n mn N N n mn N N n mn N m N m N N n mn N m N N n mn N N n mn N m N N n mn N WnxWWnxmB WnxWWnxmA WnxWnx WnxmA mBWmANmX mBWmAmX WnxWWnxNmX WnxWWnxmX mn N mn N 22 Derivation of Radix-2 FFT Algorithm ∑ ∑ ∑ ∑ − = − = − = − = ++ += ++ = −=+ += 1)4/( 0 4/2/ 1)4/( 0 4/ 1)4/( 0 4/2/ 1)4/( 0 4/ )34( )14()( )24( )4()( )()()2/( )()()( N n mn N m N N n mn N N n mn N m N N n mn N m N m N WnxW WnxmB WnxW WnxmA mBWmANmX mBWmAmX Twiddle factor for N = 8, ranges from to because the m index, for A(m) and B(m), goes from 0 to 3 m NW 2/ 0 4W 3 4W 23 Derivation of Radix-2 FFT Algorithm  Fig. 4-3  For any N-point DFT, we break each of N/2-point DFTs into two N/4-point DFTs to further reduce the number of sine and cosine multiplications  Eventually, we arrive at an array of 2-point DFTs where no further computational savings could be realized  The 2-point DFT functions cannot be partitioned into smaller parts  Butterfly of a single 2-point DFT is shown in Fig. 4-4 24 Derivation of Radix-2 FFT Algorithm  The 2-point DFT blocks in Fig. 4-3 are replaced by butterfly in Fig. 4-4 to give a full 8-point FFT implementation of DFT as shown in Fig. 4-5 1 1 2/22/ /020 −=== == −− − ππ π jNNjN N Nj N eeW eW 25 Derivation of Radix-2 FFT Algorithm 26 FFT Input/Output Data Index Bit Reversal  Decimation-in-time FFT implementation  Was the title of Fig. 4-5  Decimation-in-time phrase refers to how we broke DFT input samples into odd and even parts  This time decimation leads to scrambled order of input data’s index n in Fig. 4-5  Shuffling of input data is known as bit reversal  Because scrambled order of input data index can be obtained by reversing bits of binary representation of normal input data index order 27 FFT Input/Output Data Index Bit Reversal  Input index bit reversal for an 8-point FFT Normal order of index n Binary bits of index n Reversed bits of index n Bit-reversed order of index n 0 000 000 0 1 001 100 4 2 010 010 2 3 011 110 6 4 100 001 1 5 101 101 5 6 110 011 3 7 111 111 7 28 Radix-2 FFT Butterfly Structures  Twiddle factors in Fig. 4-5  To simplify signal flows, replace twiddle factors with their equivalent values referenced to where N = 8  We show just exponents m of , to get FFT structure shown in Fig. 4-8  Fig. 4-8  from Fig. 4-5   from Fig. 4-5   …  1s and −1s in the first stage of Fig. 4-5 are replaced by 0s and 4s, respectively m NW m NW 1 4W 2 8W 2 4W 4 8W 29 Radix-2 FFT Butterfly Structures 30 Radix-2 FFT Butterfly Structures 31 Radix-2 FFT Butterfly Structures  Fig. 4-9  Input data is in its normal order and output data indices are bit-reversed  In this case, a bit-reversal operation needs to be performed at output of FFT to unscramble frequency-domain results  Fig. 4-10  Shows an FFT signal-flow structure that avoids bit-reversal problem altogether 32 Radix-2 FFT Butterfly Structures 33 Radix-2 FFT Butterfly Structures  Bit reversal  A few years ago, hardware implementations of FFT spent most of their time performing multiplications  Bit-reversal process necessary to access data in memory wasn’t a significant portion of overall FFT computational problem  Now that high-speed multiplier/accumulator integrated circuits can multiply two numbers in a single clock cycle, FFT data multiplexing and memory addressing are more important  Led to development of efficient algorithms to perform bit reversal 34 Radix-2 FFT Butterfly Structures  Decimation-in-frequency algorithm  Decimation-in-time or -frequency is determined by whether the DFT inputs or outputs are partitioned (into odd and even) when deriving a particular FFT butterfly structure from the DFT equations  Decimation-in-frequency butterfly structures (analogous to structures in Figs. 4-8 through 4- 10) are illustrated in Figs. 4-11 through 4-13  An equivalent decimation-in-frequency FFT structure exists for each decimation-in-time FFT structure  The number of necessary multiplications is the same for both structures 35 Radix-2 FFT Butterfly Structures 36 Radix-2 FFT Butterfly Structures 37 Radix-2 FFT Butterfly Structures 38 Alternate Single-Butterfly Structures  Butterfly structures  FFT butterfly structures are direct result of derivations of decimation-in-time and decimationin-frequency algorithms  Twiddle factors always take general forms shown in Fig. 4-14(a) 39 Alternate Single-Butterfly Structures 40 Alternate Single-Butterfly Structures  Fig. 4-14  To implement decimation-in-time butterfly of (a), we have to perform two complex multiplications and two complex additions  So we replace in (a) with to give us simplified butterflies in (b)  Because twiddle factors in (b) differ only by their signs, the optimized butterflies in (c) can be used k N k N NNjk N N N k N Nk N Nk N k N WWeWWWW yWxy yWxx −=−=== → += += −+ + )1()( ' ' 2/22/2/tionsimplifica 2/ π 2/Nk NW + k NW− 41 Alternate Single-Butterfly Structures  Optimized butterflies in 4-14(c)  Require two complex additions but only one complex multiplication, thus reducing computational workload  Because there are (N/2)log2N butterflies in an Npoint FFT, the number of complex multiplications performed by an FFT is (N/2)log2N  An algorithm is decimation-in-time if the twiddle factor precedes the −1 in optimized butterflies  An algorithm is decimation-in-frequency if the twiddle factor follows the −1 in optimized butterflies 42 Alternate Single-Butterfly Structures 43 Alternate Single-Butterfly Structures  In-place FFT algorithms  An in-place algorithm is depicted in Fig. 4-5  Output of a butterfly operation can be stored in the same hardware memory locations that previously held butterfly’s input data  No intermediate storage is necessary  For an N-point FFT, only 2N memory locations are needed  The 2 comes from fact that each butterfly node represents a data value that has both a real and an imaginary part  Data routing and memory addressing are rather complicated 44 Alternate Single-Butterfly Structures  Double-memory FFT algorithms  A double-memory FFT structure is depicted in Fig. 4-10  Intermediate storage is necessary because we no longer have standard butterflies, and 4N memory locations are needed  Data routing and memory address control are much simpler in double-memory FFT structures than in-place technique