Skip to content

Commit f43656c

Browse files
bibikaroleksandr-pavlyk
authored andcommitted
Move everything to root directory of repository
1 parent c42dccf commit f43656c

13 files changed

+134
-89
lines changed
File renamed without changes.

README.md

+134
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# FFT benchmarks for Intel(R) Distribution for Python\*
2+
3+
This set of benchmarks measures performance of FFT computations, serving to
4+
highlight performance improvements to FFT computations in NumPy and SciPy in
5+
the Intel(R) Distribution for Python\*. We provide both Python and native
6+
(MKL DFTI) implementations of these benchmarks with similar command-line
7+
interfaces.
8+
9+
## Python benchmarks
10+
11+
To reproduce, install Intel(R) Distribution for Python\* as follows:
12+
13+
```bash
14+
conda create -n 'idp3_fft' -c intel numpy scipy
15+
conda activate idp3_fft
16+
```
17+
18+
To benchmark FFT in Python, execute
19+
20+
```bash
21+
python fft_bench.py [-h] [args] size
22+
```
23+
24+
The methodology is to perform one unmeasured computation, and then repeat 24
25+
total timings for 16 repetitions of FFT computations in the loop. The 24
26+
measurements are aggregated to report minimum, median and maximum timings,
27+
which are printed to STDOUT.
28+
29+
Other printed lines which start with 'TAG: ' are printed for information only,
30+
and can be filtered out if need be.
31+
32+
### Examples
33+
34+
Benchmark a 2D out-of-place FFT of a `complex128` array of size `(10000,
35+
10000)`:
36+
```
37+
python fft_bench.py 10000x10000
38+
```
39+
40+
Benchmark a 1D in-place FFT of a `float32` array of size `100000000`, print
41+
only 5 measurements, only compute the first half of the conjugate-even
42+
DFT coefficients, and allow the FFT backend to only use one thread:
43+
```
44+
python fft_bench.py -P -r -t 1 -d float32 -o 5 100000000
45+
```
46+
47+
Benchmark a 3D in-place FFT of a `complex64` array of size `1001x203x3005`,
48+
printing only 5 measurements, each of which average over 24 inner loop
49+
computations:
50+
```
51+
python fft_bench.py -P -d complex64 -o 5 -i 24 1001x203x3005
52+
```
53+
54+
## Native benchmarks
55+
56+
### Compiling on Linux
57+
- To compile, source compiler and run `make`.
58+
- Run with `./fft_bench`.
59+
60+
### Compiling on Windows
61+
- Source compiler and MKL, then run `win_compile_all.bat`.
62+
```
63+
> "C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\bin\compilervars.bat intel64"
64+
> "C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl\bin\mklvars.bat intel64"
65+
> win_compile_all.bat
66+
```
67+
- To run, run `fft_bench.exe`. Note that long options are not supported on
68+
Windows. Use short options instead.
69+
70+
### Examples
71+
72+
Benchmark a 2D out-of-place FFT of a `complex128` array of size `(10000,
73+
10000)`:
74+
```
75+
./fft_bench 10000x10000
76+
```
77+
78+
Benchmark a 1D in-place FFT of a `float32` array of size `100000000`, print
79+
only 5 measurements, only compute the first half of the conjugate-even
80+
DFT coefficients, and allow the FFT backend to only use one thread:
81+
```
82+
./fft_bench -P -r -t 1 -d float32 -o 5 100000000
83+
```
84+
85+
Benchmark a 3D in-place FFT of a `complex64` array of size `1001x203x3005`,
86+
printing only 5 measurements, each of which average over 24 inner loop
87+
computations:
88+
```
89+
./fft_bench -P -d complex64 -o 5 -i 24 1001x203x3005
90+
```
91+
92+
### Usage
93+
94+
```
95+
usage: ./fft_bench [args] size
96+
Benchmark FFT using Intel(R) MKL DFTI.
97+
98+
FFT problem arguments:
99+
-t, --threads=THREADS use THREADS threads for FFT execution
100+
(default: use MKL's default)
101+
-d, --dtype=DTYPE use DTYPE as the FFT domain. For a list of
102+
understood dtypes, use '-d help'.
103+
(default: complex128)
104+
-r, --rfft do not copy superfluous harmonics when FFT
105+
output is even-conjugate, i.e. for real inputs
106+
-P, --in-place allow overwriting the input buffer with the
107+
FFT outputs
108+
-c, --cached use the same DFTI descriptor for the same
109+
outer loop, i.e. "cache" the descriptor
110+
111+
Timing arguments:
112+
-i, --inner-loops=IL time the benchmark IL times for each printed
113+
measurement. Copies are not included in the
114+
measurements. (default: 16)
115+
-o, --outer-loops=OL print OL measurements. (default: 5)
116+
117+
Output arguments:
118+
-p, --prefix=PREFIX output PREFIX as the first value in outputs
119+
(default: 'Native-C')
120+
-H, --no-header do not output CSV header. This can be useful
121+
if running multiple benchmarks back-to-back.
122+
-h, --help print this message and exit
123+
124+
The size argument specifies the input matrix size as a tuple of positive
125+
decimal integers, delimited by any non-digit. For example, both
126+
(101, 203, 305) and 101x203x305 denote the same 3D FFT.
127+
```
128+
129+
## See also
130+
"[Accelerating Scientific Python with Intel
131+
Optimizations](http://conference.scipy.org/proceedings/scipy2017/pdfs/oleksandr_pavlyk.pdf)"
132+
by Oleksandr Pavlyk, Denis Nagorny, Andres Guzman-Ballen, Anton Malakhov, Hai
133+
Liu, Ehsan Totoni, Todd A. Anderson, Sergey Maidanov. Proceedings of the 16th
134+
Python in Science Conference (SciPy 2017), July 10 - July 16, Austin, Texas
File renamed without changes.
File renamed without changes.
File renamed without changes.

native/README.md

-52
This file was deleted.

python/perf.py renamed to perf.py

File renamed without changes.

python/.gitignore

-2
This file was deleted.

python/README.md

-35
This file was deleted.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)