IntelPython
diff --git a/‎native/Makefile renamed to ‎Makefile b/‎native/Makefile renamed to ‎Makefile
diff --git a/‎README.md
+134 b/‎README.md
+134
diff --git a/‎native/fft_bench.c renamed to ‎fft_bench.c b/‎native/fft_bench.c renamed to ‎fft_bench.c
diff --git a/‎python/fft_bench.py renamed to ‎fft_bench.py b/‎python/fft_bench.py renamed to ‎fft_bench.py
diff --git a/‎native/moments.h renamed to ‎moments.h b/‎native/moments.h renamed to ‎moments.h
diff --git a/‎native/README.md
-52 b/‎native/README.md
-52
diff --git a/‎python/perf.py renamed to ‎perf.py b/‎python/perf.py renamed to ‎perf.py
diff --git a/‎python/.gitignore
-2 b/‎python/.gitignore
-2
diff --git a/‎python/README.md
-35 b/‎python/README.md
-35
diff --git a/‎python/scipy_paper/fft_in-out-place.py renamed to ‎scipy_paper/fft_in-out-place.py b/‎python/scipy_paper/fft_in-out-place.py renamed to ‎scipy_paper/fft_in-out-place.py
diff --git a/‎python/scipy_paper/fft_in-out-place_single.py renamed to ‎scipy_paper/fft_in-out-place_single.py b/‎python/scipy_paper/fft_in-out-place_single.py renamed to ‎scipy_paper/fft_in-out-place_single.py
diff --git a/‎python/scipy_paper/fft_strides.py renamed to ‎scipy_paper/fft_strides.py b/‎python/scipy_paper/fft_strides.py renamed to ‎scipy_paper/fft_strides.py
diff --git a/‎native/win_compile_all.bat renamed to ‎win_compile_all.bat b/‎native/win_compile_all.bat renamed to ‎win_compile_all.bat
@@ -0,0 +1,134 @@
+# FFT benchmarks for Intel(R) Distribution for Python\*
+
+This set of benchmarks measures performance of FFT computations, serving to
+highlight performance improvements to FFT computations in NumPy and SciPy in
+the Intel(R) Distribution for Python\*. We provide both Python and native
+(MKL DFTI) implementations of these benchmarks with similar command-line
+interfaces.
+
+## Python benchmarks
+
+To reproduce, install Intel(R) Distribution for Python\* as follows:
+
+```bash
+conda create -n 'idp3_fft' -c intel numpy scipy
+conda activate idp3_fft
+```
+
+To benchmark FFT in Python, execute
+
+```bash
+python fft_bench.py [-h] [args] size
+```
+
+The methodology is to perform one unmeasured computation, and then repeat 24
+total timings for 16 repetitions of FFT computations in the loop.  The 24
+measurements are aggregated to report minimum, median and maximum timings,
+which are printed to STDOUT.
+
+Other printed lines which start with 'TAG: ' are printed for information only,
+and can be filtered out if need be.
+
+### Examples
+
+Benchmark a 2D out-of-place FFT of a `complex128` array of size `(10000,
+10000)`:
+```
+python fft_bench.py 10000x10000
+```
+
+Benchmark a 1D in-place FFT of a `float32` array of size `100000000`, print
+only 5 measurements, only compute the first half of the conjugate-even
+DFT coefficients, and allow the FFT backend to only use one thread:
+```
+python fft_bench.py -P -r -t 1 -d float32 -o 5 100000000
+```
+
+Benchmark a 3D in-place FFT of a `complex64` array of size `1001x203x3005`,
+printing only 5 measurements, each of which average over 24 inner loop
+computations:
+```
+python fft_bench.py -P -d complex64 -o 5 -i 24 1001x203x3005
+```
+
+## Native benchmarks
+
+### Compiling on Linux
+- To compile, source compiler and run `make`.
+- Run with `./fft_bench`.
+
+### Compiling on Windows
+- Source compiler and MKL, then run `win_compile_all.bat`.
+  ```
+  > "C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\bin\compilervars.bat intel64"
+  > "C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl\bin\mklvars.bat intel64"
+  > win_compile_all.bat
+  ```
+- To run, run `fft_bench.exe`. Note that long options are not supported on
+  Windows. Use short options instead.
+
+### Examples
+
+Benchmark a 2D out-of-place FFT of a `complex128` array of size `(10000,
+10000)`:
+```
+./fft_bench 10000x10000
+```
+
+Benchmark a 1D in-place FFT of a `float32` array of size `100000000`, print
+only 5 measurements, only compute the first half of the conjugate-even
+DFT coefficients, and allow the FFT backend to only use one thread:
+```
+./fft_bench -P -r -t 1 -d float32 -o 5 100000000
+```
+
+Benchmark a 3D in-place FFT of a `complex64` array of size `1001x203x3005`,
+printing only 5 measurements, each of which average over 24 inner loop
+computations:
+```
+./fft_bench -P -d complex64 -o 5 -i 24 1001x203x3005
+```
+
+### Usage
+
+```
+usage: ./fft_bench [args] size
+Benchmark FFT using Intel(R) MKL DFTI.
+
+FFT problem arguments:
+  -t, --threads=THREADS    use THREADS threads for FFT execution
+                           (default: use MKL's default)
+  -d, --dtype=DTYPE        use DTYPE as the FFT domain. For a list of
+                           understood dtypes, use '-d help'.
+                           (default: complex128)
+  -r, --rfft               do not copy superfluous harmonics when FFT
+                           output is even-conjugate, i.e. for real inputs
+  -P, --in-place           allow overwriting the input buffer with the
+                           FFT outputs
+  -c, --cached             use the same DFTI descriptor for the same
+                           outer loop, i.e. "cache" the descriptor
+
+Timing arguments:
+  -i, --inner-loops=IL     time the benchmark IL times for each printed
+                           measurement. Copies are not included in the
+                           measurements. (default: 16)
+  -o, --outer-loops=OL     print OL measurements. (default: 5)
+
+Output arguments:
+  -p, --prefix=PREFIX      output PREFIX as the first value in outputs
+                           (default: 'Native-C')
+  -H, --no-header          do not output CSV header. This can be useful
+                           if running multiple benchmarks back-to-back.
+  -h, --help               print this message and exit
+
+The size argument specifies the input matrix size as a tuple of positive
+decimal integers, delimited by any non-digit. For example, both
+(101, 203, 305) and 101x203x305 denote the same 3D FFT.
+```
+
+## See also
+"[Accelerating Scientific Python with Intel
+Optimizations](http://conference.scipy.org/proceedings/scipy2017/pdfs/oleksandr_pavlyk.pdf)"
+by Oleksandr Pavlyk, Denis Nagorny, Andres Guzman-Ballen, Anton Malakhov, Hai
+Liu, Ehsan Totoni, Todd A. Anderson, Sergey Maidanov. Proceedings of the 16th
+Python in Science Conference (SciPy 2017), July 10 - July 16, Austin, Texas