Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Cooley–Tukey FFT algorithm
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Variations == There are many other variations on the Cooley–Tukey algorithm. '''Mixed-radix''' implementations handle composite sizes with a variety of (typically small) factors in addition to two, usually (but not always) employing the O(''N''<sup>2</sup>) algorithm for the prime base cases of the recursion <nowiki>(</nowiki>it is also possible to employ an ''N'' log ''N'' algorithm for the prime base cases, such as [[Rader's FFT algorithm|Rader]]'s or [[Bluestein's FFT algorithm|Bluestein]]'s algorithm<nowiki>)</nowiki>. [[Split-radix FFT algorithm|Split radix]] merges radices 2 and 4, exploiting the fact that the first transform of radix 2 requires no twiddle factor, in order to achieve what was long the lowest known arithmetic operation count for power-of-two sizes,<ref name=DuhamelVe90/> although recent variations achieve an even lower count.<ref>Lundy, T., and J. Van Buskirk, "A new matrix approach to real FFTs and convolutions of length 2<sup>''k''</sup>," ''Computing'' '''80''', 23–45 (2007).</ref><ref>Johnson, S. G., and M. Frigo, "[http://www.fftw.org/newsplit.pdf A modified split-radix FFT with fewer arithmetic operations]," ''IEEE Trans. Signal Process.'' '''55''' (1), 111–119 (2007).</ref> (On present-day computers, performance is determined more by [[CPU cache|cache]] and [[CPU pipeline]] considerations than by strict operation counts; well-optimized FFT implementations often employ larger radices and/or hard-coded base-case transforms of significant size.<ref name=FrigoJohnson05/>). Another way of looking at the Cooley–Tukey algorithm is that it re-expresses a size ''N'' one-dimensional DFT as an ''N''<sub>1</sub> by ''N''<sub>2</sub> two-dimensional DFT (plus twiddles), where the output matrix is [[transpose]]d. The net result of all of these transpositions, for a radix-2 algorithm, corresponds to a bit reversal of the input (DIF) or output (DIT) indices. If, instead of using a small radix, one employs a radix of roughly {{radic|''N''}} and explicit input/output matrix transpositions, it is called a [[four-step FFT]] algorithm (or ''six-step'', depending on the number of transpositions), initially proposed to improve memory locality,<ref name=GenSande66>Gentleman W. M., and G. Sande, "Fast Fourier transforms—for fun and profit," ''Proc. AFIPS'' '''29''', 563–578 (1966).</ref><ref name=Bailey90>Bailey, David H., "FFTs in external or hierarchical memory," ''J. Supercomputing'' '''4''' (1), 23–35 (1990)</ref> e.g. for cache optimization or [[out-of-core]] operation, and was later shown to be an optimal [[cache-oblivious algorithm]].<ref name=Frigo99>M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In ''Proceedings of the 40th IEEE Symposium on Foundations of Computer Science'' (FOCS 99), p.285-297. 1999. [https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=814600 Extended abstract at IEEE], [http://citeseer.ist.psu.edu/307799.html at Citeseer].</ref> The general Cooley–Tukey factorization rewrites the indices ''k'' and ''n'' as <math>k = N_2 k_1 + k_2</math> and <math>n = N_1 n_2 + n_1</math>, respectively, where the indices ''k''<sub>a</sub> and ''n''<sub>a</sub> run from 0..''N''<sub>a</sub>-1 (for ''a'' of 1 or 2). That is, it re-indexes the input (''n'') and output (''k'') as ''N''<sub>1</sub> by ''N''<sub>2</sub> two-dimensional arrays in [[column-major order|column-major]] and [[row-major order]], respectively; the difference between these indexings is a transposition, as mentioned above. When this re-indexing is substituted into the DFT formula for ''nk'', the <math>N_1 n_2 N_2 k_1</math> cross term vanishes (its exponential is unity), and the remaining terms give :<math>X_{N_2 k_1 + k_2} = \sum_{n_1=0}^{N_1-1} \sum_{n_2=0}^{N_2-1} x_{N_1 n_2 + n_1} e^{-\frac{2\pi i}{N_1 N_2} \cdot (N_1 n_2 + n_1) \cdot (N_2 k_1 + k_2) }</math> ::<math>= \sum_{n_1=0}^{N_1-1} \left[ e^{-\frac{2\pi i}{N_1N_2} n_1 k_2 } \right] \left( \sum_{n_2=0}^{N_2-1} x_{N_1 n_2 + n_1} e^{-\frac{2\pi i}{N_2} n_2 k_2 } \right) e^{-\frac{2\pi i}{N_1} n_1 k_1 } </math> ::<math>= \sum_{n_1=0}^{N_1-1} \left( \sum_{n_2=0}^{N_2-1} x_{N_1 n_2 + n_1} e^{-\frac{2\pi i}{N_2} n_2 k_2 } \right) e^{-\frac{2\pi i}{N_1N_2} n_1(N_2k_1+k_2) } </math>. where each inner sum is a DFT of size ''N''<sub>2</sub>, each outer sum is a DFT of size ''N''<sub>1</sub>, and the <nowiki>[...]</nowiki> bracketed term is the twiddle factor. An arbitrary radix ''r'' (as well as mixed radices) can be employed, as was shown by both Cooley and Tukey<ref name=CooleyTukey65/> as well as Gauss (who gave examples of radix-3 and radix-6 steps).<ref name=Heideman84/> Cooley and Tukey originally assumed that the radix butterfly required O(''r''<sup>2</sup>) work and hence reckoned the complexity for a radix ''r'' to be O(''r''<sup>2</sup> ''N''/''r'' log<sub>''r''</sub>''N'') = O(''N'' log<sub>2</sub>(''N'') ''r''/log<sub>2</sub>''r''); from calculation of values of ''r''/log<sub>2</sub>''r'' for integer values of ''r'' from 2 to 12 the optimal radix is found to be 3 (the closest integer to ''[[e (mathematical constant)|e]]'', which minimizes ''r''/log<sub>2</sub>''r'').<ref name=CooleyTukey65/><ref>Cooley, J. W., P. Lewis and P. Welch, "The Fast Fourier Transform and its Applications", ''IEEE Trans on Education'' '''12''', 1, 28–34 (1969)</ref> This analysis was erroneous, however: the radix-butterfly is also a DFT and can be performed via an FFT algorithm in O(''r'' log ''r'') operations, hence the radix ''r'' actually cancels in the complexity O(''r'' log(''r'') ''N''/''r'' log<sub>''r''</sub>''N''), and the optimal ''r'' is determined by more complicated considerations. In practice, quite large ''r'' (32 or 64) are important in order to effectively exploit e.g. the large number of [[processor register]]s on modern processors,<ref name=FrigoJohnson05/> and even an unbounded radix ''r''={{radic|''N''}} also achieves O(''N'' log ''N'') complexity and has theoretical and practical advantages for large ''N'' as mentioned above.<ref name=GenSande66/><ref name=Bailey90/><ref name=Frigo99/>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)