Notes on FFT and NTT
Translated by ChatGPT.
The following is my intuitive understanding of FFT. It may not be rigorous; corrections are welcome if there are mistakes.
FFT
The algorithm discussed below is the Cooley-Tukey FFT, which is more widely used in competitive programming.
Prerequisite: complex numbers, and an understanding of Euler’s formula.
Polynomial multiplication
For degree- polynomials
Their convolution is , where
So a naive polynomial convolution needs coefficient multiplications, and we need to optimize it.
Point-value representation
A degree- polynomial can be determined by coefficients, and it can also be determined by coordinates (point values). In other words, a degree- polynomial can be viewed as an -dimensional vector.
Consider choosing coordinates to determine and . Then can be obtained simply by doing multiplications:
Now we have a new idea: first convert from coefficient representation to point-value representation, do the multiplication, and then convert back.
DFT
How do we convert a polynomial into point values? We have the discrete Fourier transform.
The solutions to the equation are called the -th roots of unity . For a given polynomial and a root of unity , define the vector
as the discrete Fourier transform of .
DFT has an inverse transform (IDFT), which converts point values back into coefficients. It is still a vector-to-vector transform.
IDFT has a key property:
We will prove it later. For now, we can handle DFT and IDFT uniformly.
For convenience, in the following we write simply as .
Primitive root of unity
At this point, the complexity of computing DFT is still . The key step of FFT is to choose special points to speed up the computation.
One special root of unity is denoted , called a primitive root of unity. By Euler’s formula,
That is, is a point on the unit circle, and all roots of unity
correspond exactly to the equally spaced points on the unit circle. Therefore, by Euler’s formula, multiplication among roots of unity is just rotating around the unit circle.
It is not hard to verify several properties of the primitive root using Euler’s formula:
- .
- .
Divide and conquer
Using the special properties of primitive roots of unity, we can compute DFT by divide and conquer. For example, for a degree- polynomial:
Separate odd and even terms:
Then the original function can be written as
In general, for a polynomial of degree less than , its value at the root of unity is
Similarly,
In DFT form:
Therefore, we need to pad the number of polynomial coefficients up to , which makes divide and conquer convenient.
Now we can write the recursive FFT.
void fft(int n, img *f, int op) {
static img tmp[1 << 18];
if (n == 1)
return;
for (int i = 0; i < n; i++)
tmp[i] = f[i];
for (int i = 0; i < n; i++) { // put even indices on the left, odd indices on the right
if (i & 1)
f[n / 2 + i / 2] = tmp[i];
else
f[i / 2] = tmp[i];
}
img *g = f, *h = f + n / 2;
fft(n / 2, g, op), fft(n / 2, h, op);
img w0 = {cos(2 * PI / n), sin(2 * PI * op / n)}, w = {1, 0};
for (int k = 0; k < n / 2; k++) {
tmp[k] = g[k] + w * h[k];
tmp[k + n / 2] = g[k] - w * h[k];
w = w * w0;
}
for (int i = 0; i < n; i++)
f[i] = tmp[i];
}
Butterfly transform
Recursive divide and conquer is never quite satisfactory. In the first few lines we are only doing recursive grouping, so we can consider doing it in one step.
Again take a degree- polynomial as an example:
- Initial:
- Once:
- Twice:
- End:
Writing them in binary, we find that the binary representation at the end is exactly the reverse of the beginning.
| Initial | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|---|
| Initial(2) | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
| End(2) | 000 | 100 | 010 | 110 | 001 | 101 | 011 | 111 |
| End | 0 | 4 | 2 | 6 | 1 | 5 | 3 | 7 |
This transform is called the butterfly transform, also known as the bit-reversal permutation.
We can preprocess the permutation array in . Let R(x) be the transformed result of ; then R(x >> 1) is already known. We just shift R(x >> 1) right by one and fill in the highest bit. The code is:
void pre_rev(int lim) {
int k = std::__lg(lim);
rev.resize(lim);
for (int i = 0; i < lim; ++i) {
rev[i] = rev[i >> 1] >> 1;
if (i & 1)
rev[i] |= lim >> 1;
// or write it in one line as
// rev[i] = (rev[i >> 1] >> 1) | ((i & 1) << (k - 1));
}
}
Now we can write the iterative FFT.
void fft(img *f, int n, int op) { // DIT
for (int i = 0; i < n; ++i)
if (i < rev[i])
swap(f[i], f[rev[i]]);
for (int l = 1; l <= n / 2; l <<= 1) {
img w0 = {cos(PI / l), sin(PI * op / l)};
for (int i = 0; i < n; i += l * 2) {
img w = {1, 0};
for (int j = 0; j < l; j++) {
img x = f[i + j], y = w * f[i + j + l];
f[i + j] = x + y, f[i + j + l] = x - y;
w = w * w0;
}
}
}
if (op == -1)
for (int i = 0; i < n; i++)
f[i] = f[i] / n;
}
NTT
Prerequisite: elementary number theory (divisibility and congruence).
Using double to implement integer multiplication is inelegant, with issues in both precision and speed. In fact, we can do everything over integers.
Primitive root
The two essential properties of the primitive root of unity that we use are:
- .
- .
This brings to mind the residue field modulo : its elements are , and all operations on it are modulo . By Fermat’s little theorem,
In other words, the positive integers are all solutions of the congruence equation .
This has a form very similar to roots of unity. Intuitively, should also contain special numbers similar to primitive roots of unity. Below we discuss this over and try to prove such a number exists.
Define the order of a positive integer as the smallest such that . By Fermat’s little theorem, , so the order of must exist and satisfies . It can be proven that
have pairwise distinct residues modulo . By Lagrange’s theorem, has at most solutions, exactly the ones shown in .
From divisibility, only gives . That is, always brings along
elements with the same order. Therefore, there are exactly numbers of order .
Since every positive integer has a uniquely determined order, suppose that for every , there exist corresponding integers of order . Counting the integers gives
which is exactly the number of all positive integers in . Therefore the assumption holds, and there exists an such that .
We call this a primitive root modulo , usually denoted by .
Number Theoretic Transform
Extract as many factors of from as possible:
Let be a primitive root of , and view as the analogue of . Using quadratic residues, it is not hard to obtain and .
Common examples are
Similarly, we can write the program:
void ntt(ll *f, int n, int type) {
for (int i = 0; i < n; ++i)
if (i < rev[i])
swap(f[i], f[rev[i]]);
for (int h = 2; h < n; h <<= 1) {
ll tg = type == 1 ? 3 : g_inv;
ll gn = qpow(tg, (P - 1) / h);
for (int j = 0; j < n; j += h) {
ll g = 1;
for (int k = j; k < j + h / 2; k++) {
ll f1 = f[k], f2 = g * f[k + h / 2] % P;
f[k] = (f1 + f2) % P;
f[k + h / 2] = (f1 - f2 + P) % P;
g = g * gn % P;
}
}
}
ll iv_n = qpow(n);
if (type == -1)
for (int i = 0; i < n; i++)
f[i] = f[i] * iv_n % P;
}
At this point, you have learned FFT. Next we will study FFT more deeply from a mathematical perspective and fill in the theoretical foundation.
Linear transform
DFT is a linear transform. In other words, it can be written as matrix multiplication:
Denote the middle -order Vandermonde matrix by .
Directly computing the inverse of is hard, but it is easy to verify that the following is a diagonal matrix:
Thus the matrix corresponding to IDFT is , proving equation .
Removing REV
Actually, the FFT and IFFT implemented above are not dual to each other; it is just the convolution theorem that makes IFFT exactly the inverse operation of FFT. More specifically, we implemented two DITs, so butterfly permutation is needed before computation.

The core of the computation lies in equation , which can be written in matrix form:
Taking the inverse of the matrix gives
This gives DIF. Similarly, we can implement two DIFs as FFT; in that case the butterfly permutation is performed after computation.
void fft(img *f, int n, int op) { // DIF
for (int l = n / 2; l >= 1; l >>= 1) {
img w0 = {cos(PI / l), sin(PI * op / l)};
for (int i = 0; i < n; i += l * 2) {
img w = {1, 0};
for (int j = 0; j < l; j++) {
img x = f[i + j], y = f[i + j + l];
f[i + j] = x + y, f[i + j + l] = w * (x - y);
w = w * w0;
}
}
}
for (int i = 0; i < n; ++i)
if (i < rev[i])
swap(f[i], f[rev[i]]);
if (op == -1)
for (int i = 0; i < n; i++)
f[i] = f[i] / n;
}
It is easy to see that if we use DIF as FFT and DIT as IFFT, no butterfly permutation is needed.
void fft(img *f, int n) {
for (int l = n / 2; l >= 1; l >>= 1) {
img w0 = {cos(PI / l), sin(PI / l)};
for (int i = 0; i < n; i += l * 2) {
img w = {1, 0};
for (int j = 0; j < l; j++) {
img x = f[i + j], y = f[i + j + l];
f[i + j] = x + y, f[i + j + l] = w * (x - y);
w = w * w0;
}
}
}
}
void ifft(img *f, int n) {
for (int l = 1; l <= n / 2; l <<= 1) {
img w0 = img{cos(PI / l), sin(PI / l)}.conj();
for (int i = 0; i < n; i += l * 2) {
img w = {1, 0};
for (int j = 0; j < l; j++) {
img x = f[i + j], y = w * f[i + j + l];
f[i + j] = x + y, f[i + j + l] = x - y;
w = w * w0;
}
}
}
for (int i = 0; i < n; i++)
f[i] = f[i] / n;
}
This is Twisted FFT.
Another interpretation
Notice that
We can start from this and reexamine the algorithm from the perspective of modulo. Suppose can be decomposed as
Let
Then
Notice that in the code we do not compute directly. Instead, we multiply the -th term by , which is equivalent to computing .
We can find that
This diagram is not easy to understand. The Original FFT diagram below is easier, but the algorithm widely used today is Twisted FFT.

From the diagram, the FFT process is to push the polynomial from the root to the leaves, obtaining values at all roots of unity. After doing the operation, it pushes from the leaves back to the root.
Original FFT
Of course, we can directly divide and conquer; this is Original FFT.

Due to space, this post will not expand on it.
Preprocessing roots of unity
Recomputing the roots of unity every time is wasteful. We can preprocess them and then use them during computation.
vector<img> ROOT;
void init(int n) {
static int lim = (ROOT = {{1, 0}}, 1);
if (lim >= n)
return;
ROOT.resize(n);
for (int l = lim; l < n; l *= 2) {
img w = {cos(PI / l / 2), sin(PI / l / 2)};
ROOT[l] = w;
for (int i = 1; i < l; ++i)
ROOT[i + l] = ROOT[i] * w;
}
lim = n;
}
Other applications
FFT is essentially a tool for quickly computing convolution. In this article, I want to focus more on understanding the computation process of FFT.
FFT has many other applications, such as fast addition and wildcard pattern matching. Later, after solving more problems, I may write another post about them. For now, I have not accumulated enough.